MFCC Sound Processor — User Guide
Spectral feature-based audio transformation using Mel-Frequency Cepstral Coefficients (MFCCs): 5 algorithms for pitch, duration, and amplitude control derived from spectral analysis.
What this does
The MFCC Sound Processor implements spectral feature-based audio transformation — using Mel-Frequency Cepstral Coefficients (MFCCs) to analyze and manipulate audio in perceptually meaningful ways. By extracting MFCCs (compact representations of spectral shape), the script creates control signals that drive pitch, duration, and amplitude modifications. Unlike traditional signal processing, this approach transforms audio based on its spectral characteristics rather than direct waveform manipulation, resulting in organic, perceptually coherent transformations that maintain the original sound's identity while creating dramatic variations.
Key Features:
- 5 Spectral Algorithms — From direct coefficient mapping to complex scrambling
- 12 Curated Presets — Optimized settings for specific transformation types
- MFCC-Based Control — Perceptually relevant spectral features drive parameters
- Multi-Parameter Control — Independent mapping to pitch, duration, amplitude
- Real-Time Spectral Analysis — Frame-by-frame MFCC extraction
- Psychoacoustic Transformations — Changes based on human hearing characteristics
Technical Implementation: (1) MFCC extraction: Sound → MFCC object with 12 coefficients, 15ms windows, 5ms steps. (2) Coefficient selection: First 3 coefficients (C1-C3) extracted for basic control; higher coefficients used for complexity analysis. (3) Normalization: Coefficients scaled to 0–1 range per frame. (4) Algorithm application: Each algorithm maps normalized coefficients to Praat Manipulation parameters (pitch tier, duration tier, amplitude scaling). (5) Resynthesis: Manipulation → Sound via overlap-add synthesis. (6) Preset logic: Pre-configured parameter mappings for specific transformation types. The process maintains temporal coherence while creating spectral-informed transformations.
Quick start
- In Praat, select exactly one Sound object (mono or stereo).
- Run script… →
mfcc_sound_processor.praat. - Choose a Preset from the categorized list (easiest way to start).
- For manual control, select Custom and choose algorithm.
- For Algorithm 1, set Pitch_range and Duration_range (0.3-0.6 typical).
- For other algorithms, adjust relevant parameters.
- Enable Play_result to hear immediately.
- Click OK — MFCC analysis followed by transformation.
MFCC Theory & Coefficient Interpretation
Mel-Frequency Cepstral Coefficients Explained
🎵 Perceptual Audio Representation
Mel Scale: Frequency warping approximating human pitch perception
Cepstral Analysis: Separating source (excitation) from filter (vocal tract)
Coefficient Meanings: C1=energy, C2=spectral tilt, C3+=spectral shape details
Window/Step: 15ms windows capture spectral snapshots, 5ms steps for smoothness
12 Coefficients: Balance between detail and computational efficiency
Coefficient Interpretation for Audio Transformation
Mapping MFCCs to perceptual parameters:
Processing Pipeline
Complete Transformation Flow
Step-by-step signal processing:
Temporal Frame Structure
Window, Step, and Frame Alignment
Frame-based processing parameters:
Algorithm Details
Algorithm 1: Direct Control
🎛️ Straightforward Coefficient Mapping
Concept: Direct mapping C1→pitch, C2→amplitude, C3→duration
Formula: pitch = map(C1_norm, p_min, p_max), amplitude = map(C2_norm, a_min, a_max), duration = map(C3_norm, d_min, d_max)
Character: Natural spectral-to-parametric transformation
Use: Basic spectral-driven effects, subtle variations
Direct Control Mathematics
Algorithm 2: Reverse Control
🔁 Temporal Spectral Reversal
Concept: Reverse MFCC trajectory: last frame controls first output, etc.
Formula: C_reverse[i] = C_original[numFrames - i + 1]
Character: "Inside-out" spectral evolution, reversed spectral character
Use: Abstract transformations, reversed spectral envelopes
Algorithm 3: Complexity Time-Stretch
⏱️ Spectral Complexity-Driven Duration
Concept: Stretch complex spectral moments, compress simple ones
Formula: complexity = sqrt(∑(C4² + C5² + ... + C12²))
Character: Elastic time based on spectral richness
Use: Rhythmic transformations, emphasis on complex moments
Complexity Calculation
Algorithm 4: Freeze Spectral Moments
⏸️ Similarity-Based Time Freezing
Concept: Freeze time when spectral content is stable/similar
Formula: distance = sqrt(∑(C_frame[i] - C_frame[i-1])²)
Character: Glitch-like freezes, stuttering on stable sounds
Use: Glitch effects, stutter edits, rhythmic freezing
Algorithm 5: Trajectory Scramble
🎲 Windowed Random Coefficient Reordering
Concept: Scramble MFCCs within temporal windows
Formula: C_scrambled[i] = C_original[random in window(i ± N/2)]
Character: Chaotic yet locally coherent transformations
Use: Experimental textures, granular-like effects
Trajectory Scramble Algorithm
Presets Gallery
🎯 Direct Control Presets
Direct: Subtle — Gentle variations: pitch ±10%, duration ±5%, amplitude 80-100%
Direct: Wide Range — Dramatic: pitch ±50%, duration ±30%, amplitude 30-100%
Direct: Pitch Focus — Pitch-focused: pitch ±50%, minimal duration/amplitude changes
🔁 Reverse Control Presets
Reverse: Classic — Standard spectral reversal, moderate parameter mapping
Reverse: Dramatic — Enhanced reversal with wider parameter ranges
⏱️ Complexity Stretch Presets
Complexity: Moderate — Balanced: threshold=0.5, stretch=0.7-2.0×
Complexity: Extreme — Exaggerated: threshold=0.4, stretch=0.5-4.0×
⏸️ Freeze Moments Presets
Freeze: Sparse — Occasional freezes: duration=0.15s, threshold=0.2
Freeze: Dense — Frequent freezes: duration=0.2s, threshold=0.4
🎲 Trajectory Scramble Presets
Scramble: Subtle — Local scrambling: window=5 frames (25ms context)
Scramble: Wild — Global scrambling: window=30 frames (150ms context)
Parameters
Preset Selection
| Parameter | Type | Default | Description |
|---|---|---|---|
| Preset | option | Custom | 12 curated presets for instant effects |
Manual Algorithm Selection
| Parameter | Type | Default | Description |
|---|---|---|---|
| Algorithm | option | Direct Control | 5 algorithms with distinct transformation logic |
Direct Control Parameters (Algorithm 1 only)
| Parameter | Type | Default | Range | Description |
|---|---|---|---|---|
| Pitch_range | real | 0.6 | 0.1-2.0 | ± range from 1.0 (0.6 = 0.4 to 1.6 × original pitch) |
| Duration_range | real | 0.3 | 0.1-1.0 | ± range from 1.0 (0.3 = 0.7 to 1.3 × original duration) |
Other Algorithm Parameters
| Parameter | Type | Default | Range | Algorithm | Description |
|---|---|---|---|---|---|
| Complexity_threshold | positive | 0.5 | 0.1-0.9 | 3 | Normalized complexity value above which frames are stretched |
| Max_stretch_factor | positive | 2.0 | 1.1-10.0 | 3 | Maximum duration stretch for complex frames |
| Freeze_duration_(s) | positive | 0.2 | 0.05-1.0 | 4 | Duration of each freeze moment in seconds |
| Scramble_window_(frames) | positive | 10 | 2-50 | 5 | Window size for trajectory scrambling in frames |
Output Control
| Parameter | Type | Default | Description |
|---|---|---|---|
| Play_result | boolean | 1 (yes) | Auto-play processed sound |
Fixed MFCC Parameters (Not Adjustable)
| Parameter | Value | Description |
|---|---|---|
| Number of coefficients | 12 | MFCCs extracted per frame |
| Window length | 0.015s (15ms) | Analysis window duration |
| Time step | 0.005s (5ms) | Frame advancement (66% overlap) |
| First frequency | 100Hz | Lowest filter center frequency |
| Filter distance | 100Hz | Spacing between filter center frequencies |
| Maximum frequency | 0 (Nyquist) | Highest frequency analyzed |
Applications
Vocal Transformation & Processing
Use case: Creating spectral-based vocal effects, pitch variations, expressive processing
Recommended algorithms: Direct Control (subtle), Complexity Stretch (rhythmic), Freeze Moments (glitch)
Presets: "Direct: Subtle" for natural variations, "Freeze: Dense" for stutter effects
Musical Composition & Production
Use case: Generating variations, creating evolving textures, rhythmic manipulation
Recommended algorithms: Complexity Stretch (rhythmic interest), Trajectory Scramble (textural)
Workflow:
- Process individual instrument tracks with Algorithm 1 for spectral-driven expression
- Use Algorithm 3 on drum loops for complexity-based rhythmic variation
- Apply Algorithm 4 to pads for glitchy, frozen moments
- Combine algorithms for complex, evolving textures
Sound Design for Media
Use case: Creating unique sound effects, transforming source material
Recommended algorithms: Reverse Control (abstract), Trajectory Scramble (experimental)
Advantages:
- Perceptually coherent transformations maintain source identity
- Spectral-based changes feel more "natural" than arbitrary effects
- Parameter mappings based on human auditory perception
- Repeatable yet variable results from same source material
Experimental & Electroacoustic Music
Use case: Spectral manipulation, abstract transformations, texture creation
Recommended algorithms: All algorithms with extreme settings
Example: Field recordings processed with Algorithm 5 (wild scramble) for granular textures
Practical Workflow Examples
🎤 Expressive Vocal Processing (Music Production)
Goal: Add spectral-driven expression to vocal tracks
Settings:
- Algorithm: Direct Control
- Preset: Direct: Subtle
- Pitch_range: 0.4 (±20% variation)
- Duration_range: 0.2 (±10% timing variation)
- Mix: 50-70% wet (blend with original)
Result: Natural-sounding vocal expression driven by spectral energy
🎵 Rhythmic Drum Processing (Electronic Music)
Goal: Create complexity-based rhythmic variations
Settings:
- Algorithm: Complexity Time-Stretch
- Preset: Complexity: Moderate
- Source: Drum loop or percussion track
- Post-process: Compression to maintain consistent level
Result: Rhythmic pattern with stretched complex moments (cymbals, fills)
🎬 Abstract Sound Design (Film/Game)
Goal: Transform ordinary sounds into unusual textures
Settings:
- Algorithm: Trajectory Scramble
- Preset: Scramble: Wild
- Scramble_window: 30 frames (150ms context)
- Source: Environmental sounds, mechanical noises
Result: Chaotic yet locally coherent abstract textures
Advanced Techniques
- Complexity → Freeze: Stretch complex sections, then freeze stable moments
- Direct → Scramble: Apply spectral control, then scramble trajectory
- Reverse → Direct: Create inside-out spectral character with natural mapping
- Parallel processing: Run different algorithms, mix results
Process sound with one algorithm, rename output, process again with different algorithm
- Speech: Clear MFCC patterns, works well with all algorithms
- Music: Harmonic richness creates interesting complexity patterns
- Percussion: Transient-rich, good for freeze and complexity algorithms
- Ambient sounds: Smooth spectra work well with direct control
- Noise: Limited MFCC variation, less dramatic results
Different source materials highlight different algorithm characteristics
Troubleshooting Common Issues
Cause: Long source file, high sampling rate
Solution: Use shorter excerpts (1-2 minutes max), resample to lower rate if possible
Cause: Source with limited spectral variation, or subtle preset
Solution: Try more dramatic presets, use harmonically rich source material
Cause: Extreme parameter settings, or source with sharp transients
Solution: Reduce parameter ranges, try different algorithm, pre-smooth source
Cause: Duration manipulation too extreme, or algorithm mismatch with material
Solution: Reduce duration_range (Algorithm 1) or max_stretch_factor (Algorithm 3)
Technical Reference
MFCC Extraction Details
Praat's MFCC Implementation
Underlying algorithm parameters:
Spectral Distance Calculation (Algorithm 4)
Euclidean Distance in MFCC Space
Measuring spectral similarity between frames: