Spectral-Driven Intensity Modulation — User Guide
Advanced audio processing that analyzes spectral characteristics over time and creates dynamic intensity modulation based on spectral flatness and roughness measurements, producing evolving volume patterns that respond to the audio's spectral content.
What this does
This script implements spectral-driven intensity modulation — an intelligent audio processing technique that analyzes the spectral content of sound over time and creates dynamic volume changes based on spectral characteristics. The algorithm measures spectral flatness (how noise-like vs. tone-like the sound is) and spectral roughness (how rapidly the spectrum changes) across multiple time windows, then uses these measurements to control the depth and speed of sine wave intensity modulation applied to the original sound.
Key Features:
- Real-time Spectral Analysis — Analyzes 8 time windows across the audio duration
- Spectral Flatness Measurement — Detects noise-like vs. tone-like characteristics
- Spectral Roughness Measurement — Measures rapid spectral changes
- Dynamic Parameter Adaptation — Modulation parameters change based on spectral content
- Smooth Interpolation — Continuous parameter transitions between analysis points
- Intelligent Intensity Control — Volume modulation responds to audio character
Technical Implementation: (1) Spectral Analysis Phase: Divide audio into 8 time windows, for each window: Extract 200ms segment with Hamming window, Convert to spectrum, Calculate spectral flatness (geometric mean/arithmetic mean of power spectrum), Calculate spectral roughness (average deviation from local spectral smoothness). (2) Modulation Generation Phase: Create intensity tier with 10ms resolution, For each time point: Interpolate flatness/roughness values from nearest analysis points, Calculate modulation depth = 20 + (flatness × 30) dB, Calculate modulation speed = 1.0 + (roughness × 4.0) Hz, Generate sine wave intensity variation, Apply safety limits (40-100 dB range). (3) Application Phase: Multiply original sound by intensity tier, Clean up intermediate objects. Key insight: The modulation "breathes" with the audio — noisy sections get deeper, faster modulation while tonal sections get subtler, slower modulation.
Quick start
- In Praat, select exactly one Sound object.
- Run script… →
spectral_intensity_modulation.praat. - The script automatically analyzes your audio (no parameters to set).
- Watch the Info window for real-time analysis results:
- Spectral flatness and roughness measurements for 8 time windows
- Progress of intensity point generation
- Final parameter ranges used
- Processing completes automatically — result named "spectral_intensity_mod_originalname".
- The modified sound is selected and ready for playback or further processing.
Spectral Analysis Phase
Time Window Analysis
🔍 Multi-Point Spectral Sampling
Concept: Capture spectral evolution by analyzing multiple time points
Technique: 8 evenly spaced analysis windows across audio duration
Window size: 200ms with 50% overlap considerations
Frequency range: 80-5000 Hz (psychoacoustically relevant range)
Analysis Window Strategy:
Spectral Flatness Measurement
Noise vs. Tone Detection
Spectral flatness calculation:
Flatness Interpretation Guide
0.00-0.10: Pure tones, harmonic complexes
0.10-0.30: Vocal vowels, sustained instruments
0.30-0.50: Mixed sources, speech consonants
0.50-0.70: Noise with some structure
0.70-1.00: White noise, unpitched percussion
Modulation Response:
Low flatness (<0.3): Gentle modulation (20-30 dB depth)
Medium flatness (0.3-0.6): Moderate modulation (30-40 dB depth)
High flatness (>0.6): Strong modulation (40-50 dB depth)
Spectral Roughness Measurement
Rapid Change Detection
Spectral roughness calculation:
Roughness Interpretation Guide
0.00-0.02: Very smooth spectra (sine waves)
0.02-0.04: Smooth spectra (sustained instruments)
0.04-0.06: Moderate variation (speech, mixed sources)
0.06-0.08: High variation (noise, consonants)
0.08-0.10+: Extreme variation (transients, distortion)
Modulation Response:
Low roughness (<0.04): Slow modulation (1.0-2.5 Hz)
Medium roughness (0.04-0.07): Medium modulation (2.5-4.0 Hz)
High roughness (>0.07): Fast modulation (4.0-5.0 Hz)
Analysis Output Example
=== ANALYZING 8 TIME WINDOWS ===
Window 1 (0.10s) - Flatness: 0.215, Roughness: 0.031
Window 2 (1.25s) - Flatness: 0.342, Roughness: 0.045
Window 3 (2.50s) - Flatness: 0.518, Roughness: 0.062
Window 4 (3.75s) - Flatness: 0.287, Roughness: 0.038
Window 5 (5.00s) - Flatness: 0.631, Roughness: 0.071
Window 6 (6.25s) - Flatness: 0.453, Roughness: 0.055
Window 7 (7.50s) - Flatness: 0.389, Roughness: 0.049
Window 8 (8.65s) - Flatness: 0.276, Roughness: 0.035
Interpretation:
- Starts tonal, becomes noisier around 2.5-5.0s
- Returns to more tonal character at end
- Roughness follows similar pattern
Intensity Modulation Phase
Parameter Interpolation
🔄 Smooth Parameter Transitions
Concept: Continuous evolution between analysis points
Technique: Linear interpolation of flatness/roughness values
Resolution: 10ms grid (100 points per second)
Result: Smooth, evolving modulation parameters
Interpolation Process:
Modulation Parameter Calculation
Dynamic Depth and Speed
Modulation parameter formulas:
Phase-Accurate Modulation
Continuous phase calculation:
Intensity Tier Generation
High-Resolution Control
Intensity tier creation:
Final Application
Sound Modification
Applying the intensity modulation:
Output Summary
=== COMPLETE ===
Final sound: spectral_intensity_mod_originalname
=== INTENSITY MODULATION PARAMETER RANGES ===
Intensity depth: 20-50 dB (higher = more volume variation)
Modulation speed: 1.0-5.0 Hz (higher = faster volume changes)
Pattern: Sine wave modulation (smooth volume oscillations)
What this means:
- Volume varies by 20-50 dB around 70 dB baseline
- Modulation rate changes between 1-5 times per second
- Smooth sine wave pattern avoids abrupt changes
- Parameters evolve based on spectral content
Technical Theory
Spectral Analysis Mathematics
Fast Fourier Transform Foundation
FFT analysis for spectral features:
Spectral Feature Mathematics
Mathematical foundations of features:
Modulation Mathematics
Sine Wave Modulation Theory
Time-varying sine wave modulation:
Psychoacoustic Considerations
Perception of Intensity Modulation
Human perception of tremolo effects:
1.0-2.0 Hz: Slow, obvious pulsations
2.0-3.5 Hz: Medium, musical tremolo
3.5-5.0 Hz: Fast, intense modulation
>5.0 Hz: Approaches vibrato perception
Modulation Depth Ranges (20-50 dB):
20-30 dB: Subtle, background effect
30-40 dB: Noticeable, musical effect
40-50 dB: Dramatic, foreground effect
Special Case Handling:
Very tonal + smooth sounds → reduced modulation
Prevents unnatural-sounding effects on pure tones
Computational Complexity
Processing Time Analysis
Algorithm complexity breakdown:
Applications
Sound Design and Music Production
Use case: Creating evolving, organic modulation effects
Technique: Process individual tracks or complete mixes
Examples: Dynamic tremolo, breathing pads, evolving textures
Audio Restoration and Enhancement
Use case: Adding life to static or dull recordings
Technique: Subtle modulation on vocals or instruments
Applications: Vintage recording enhancement, mono source spatialization
Experimental and Electronic Music
Use case: Generating complex, algorithmically-driven effects
Technique: Extreme settings on synthetic sounds
Results: Unpredictable, evolving modulation patterns
Psychoacoustic Research
Use case: Studying perception of dynamic intensity changes
Technique: Controlled modulation based on spectral features
Applications: Hearing research, audio perception studies
Practical Workflow Examples
🎵 Vocal Enhancement
Goal: Add natural-sounding dynamics to vocal tracks
Source: Monophonic vocal recording
Expected Result:
- Vowel sections: subtle, slow modulation
- Consonant sections: stronger, faster modulation
- Natural breathing effect throughout
Tip: Use moderate-length phrases for best results
🎹 Synthetic Texture Creation
Goal: Create evolving textures from static synth sounds
Source: Sustained synthesizer pad or drone
Expected Result:
- Harmonic content: gentle pulsation
- Noisy content: intense modulation
- Evolving pattern as sound changes
Tip: Layer multiple processed versions
🥁 Percussive Processing
Goal: Add dynamic interest to drum loops
Source: Drum loop or percussion track
Expected Result:
- Kick/snare transients: minimal modulation
- Cymbal sustain: strong modulation
- Rhythmic pulsation pattern
Tip: Process individual drum hits separately
Advanced Techniques
- Layered processing: Apply multiple times with different source material
- Selective processing: Process only specific frequency ranges
- Reverse processing: Apply to reversed audio for unusual effects
- Parameter extraction: Use the analysis data for other purposes
Experiment with unconventional source material for unique results
- Flatness data: Use for noise reduction decisions
- Roughness data: Use for transient detection
- Time evolution: Study spectral changes over time
- Comparative analysis: Compare different sounds' spectral characteristics
Troubleshooting Common Issues
Cause: Source material too tonal and smooth
Solution: Use more spectrally varied source material
Cause: Source material very noisy and complex
Solution: Use more tonal source material, or mix with dry signal
Cause: Very long audio file
Solution: Process shorter segments, or accept longer wait time
Cause: Extreme spectral variations in source
Solution: Use more consistent source material, or reduce modulation strength
Technical Reference
Parameter Ranges and Limits
| Parameter | Range | Default | Description |
|---|---|---|---|
| Analysis Windows | 8 fixed | 8 | Number of spectral analysis points |
| Window Duration | 200ms fixed | 0.2s | Analysis window length |
| Frequency Range | 80-5000Hz | 80-5000Hz | Spectral analysis band |
| Time Resolution | 10ms fixed | 0.01s | Intensity point spacing |
| Base Intensity | 70dB fixed | 70dB | Center point for modulation |
| Intensity Depth | 20-50dB | dynamic | Modulation amplitude range |
| Modulation Speed | 1.0-5.0Hz | dynamic | Oscillation frequency range |
| Safety Limits | 40-100dB | 40-100dB | Output intensity boundaries |
Algorithm Performance Characteristics
Spectral Analysis: O(N log N) per window × 8 windows
Modulation Generation: O(T/0.01) where T = duration
Memory Usage: O(T/0.01) for intensity tier
Typical Processing Times:
30-second audio: 5-15 seconds
2-minute audio: 20-60 seconds
5-minute audio: 1-3 minutes
10-minute audio: 2-6 minutes
Quality Factors:
FFT resolution: Determined by 200ms window
Time resolution: 10ms (adequate for 5Hz modulation)
Interpolation: Linear (smooth enough for audio)
Spectral Feature Interpretation Guide
| Sound Type | Flatness Range | Roughness Range | Modulation Result |
|---|---|---|---|
| Pure tones | 0.00-0.10 | 0.00-0.02 | Very subtle, slow |
| Vocal vowels | 0.10-0.25 | 0.02-0.04 | Subtle, musical |
| Sustained instruments | 0.15-0.35 | 0.03-0.05 | Moderate, evolving |
| Speech consonants | 0.40-0.70 | 0.05-0.08 | Strong, fast |
| Noise sources | 0.70-0.95 | 0.06-0.10 | Intense, rapid |
| Transients | 0.30-0.60 | 0.08-0.15 | Very fast, dramatic |