Spectral-Driven Intensity Modulation — User Guide

Advanced audio processing that analyzes spectral characteristics over time and creates dynamic intensity modulation based on spectral flatness and roughness measurements, producing evolving volume patterns that respond to the audio's spectral content.

Author: Shai Cohen Affiliation: Department of Music, Bar-Ilan University, Israel Version: 0.1 (2025) License: MIT License Repo: https://github.com/ShaiCohen-ops/Praat-plugin_AudioTools

Contents:

What this does Quick start Spectral Analysis Intensity Modulation Technical Theory Applications

What this does

This script implements spectral-driven intensity modulation — an intelligent audio processing technique that analyzes the spectral content of sound over time and creates dynamic volume changes based on spectral characteristics. The algorithm measures spectral flatness (how noise-like vs. tone-like the sound is) and spectral roughness (how rapidly the spectrum changes) across multiple time windows, then uses these measurements to control the depth and speed of sine wave intensity modulation applied to the original sound.

Key Features:

Real-time Spectral Analysis — Analyzes 8 time windows across the audio duration
Spectral Flatness Measurement — Detects noise-like vs. tone-like characteristics
Spectral Roughness Measurement — Measures rapid spectral changes
Dynamic Parameter Adaptation — Modulation parameters change based on spectral content
Smooth Interpolation — Continuous parameter transitions between analysis points
Intelligent Intensity Control — Volume modulation responds to audio character

What is spectral-driven modulation? Traditional modulation: fixed-rate tremolo, LFO-controlled volume changes. Spectral-driven modulation: analyzes the audio's spectral content and adapts modulation parameters in real-time. Spectral flatness indicates whether the sound is noise-like (high flatness) or tone-like (low flatness). Spectral roughness measures how rapidly the spectrum changes, indicating textural complexity. The algorithm uses these measurements to control: (1) Modulation depth: How much volume variation occurs (20-50 dB range), (2) Modulation speed: How fast volume changes occur (1.0-5.0 Hz range). Advantages: (1) Context-aware: Modulation responds to audio content, (2) Natural sounding: Avoids mechanical, repetitive patterns, (3) Evolutionary: Creates evolving, organic modulation effects. Use cases: Sound design, experimental music, audio restoration, psychoacoustic research, creative effects.

Technical Implementation: (1) Spectral Analysis Phase: Divide audio into 8 time windows, for each window: Extract 200ms segment with Hamming window, Convert to spectrum, Calculate spectral flatness (geometric mean/arithmetic mean of power spectrum), Calculate spectral roughness (average deviation from local spectral smoothness). (2) Modulation Generation Phase: Create intensity tier with 10ms resolution, For each time point: Interpolate flatness/roughness values from nearest analysis points, Calculate modulation depth = 20 + (flatness × 30) dB, Calculate modulation speed = 1.0 + (roughness × 4.0) Hz, Generate sine wave intensity variation, Apply safety limits (40-100 dB range). (3) Application Phase: Multiply original sound by intensity tier, Clean up intermediate objects. Key insight: The modulation "breathes" with the audio — noisy sections get deeper, faster modulation while tonal sections get subtler, slower modulation.

Quick start

In Praat, select exactly one Sound object.
Run script… → spectral_intensity_modulation.praat.
The script automatically analyzes your audio (no parameters to set).
Watch the Info window for real-time analysis results:

Spectral flatness and roughness measurements for 8 time windows
Progress of intensity point generation
Final parameter ranges used

Processing completes automatically — result named "spectral_intensity_mod_originalname".
The modified sound is selected and ready for playback or further processing.

Quick tip: The script works best with dynamically varied material — sounds that change character over time will produce the most interesting modulation patterns. For subtle effects, use sources with moderate spectral variation. For dramatic effects, use sounds with strong contrasts between noisy and tonal sections. The algorithm automatically adapts to your audio — no manual parameter tuning needed. Processing time depends on audio length — longer files take more time due to the detailed spectral analysis. The Info window provides complete transparency about what the algorithm is detecting and how it's responding.

Important: SPECTRAL ANALYSIS INTENSIVE — the script performs detailed FFT analysis on 8 time windows, which can be computationally demanding for very long files. 200ms analysis windows are used — very short sounds (< 0.5s) may not provide enough data for meaningful analysis. The algorithm is destructive — original amplitude relationships are altered. Extreme modulation can occur with highly variable source material — always preview results. The frequency range 80-5000 Hz is analyzed — very low or very high frequency content outside this range doesn't influence the modulation. Silent or near-silent sections may produce unpredictable results due to numerical precision limits.

Spectral Analysis Phase

Time Window Analysis

🔍 Multi-Point Spectral Sampling

Concept: Capture spectral evolution by analyzing multiple time points

Technique: 8 evenly spaced analysis windows across audio duration

Window size: 200ms with 50% overlap considerations

Frequency range: 80-5000 Hz (psychoacoustically relevant range)

Analysis Window Strategy:

SETUP: numAnalysisPoints = 8 analysisTimes#[] = zero#(8) flatness#[] = zero#(8) roughness#[] = zero#(8) FOR each point FROM 1 TO 8: // Calculate analysis time time = (point-1) × duration / 7 // Evenly spaced // Ensure minimum distance from edges IF time < 0.1: time = 0.1 IF time > duration-0.2: time = duration-0.2 // Extract analysis window (200ms centered) beginTime = time - 0.1 endTime = time + 0.1 Apply boundary checks // Process window Extract part: beginTime, endTime, "Hamming", 1, "no" To Spectrum: "yes" // Calculate spectral features Calculate flatness and roughness Store results // Cleanup Remove window and spectrum

Spectral Flatness Measurement

Noise vs. Tone Detection

Spectral flatness calculation:

Spectral Flatness = Geometric Mean / Arithmetic Mean Calculation: FOR each frequency bin FROM 80 Hz TO 5000 Hz: power = re² + im² power = max(power, 1e-12) // Avoid log(0) lnSum = lnSum + ln(power) linearSum = linearSum + power validBins = validBins + 1 geometricMean = exp(lnSum / validBins) arithmeticMean = linearSum / validBins flatness = geometricMean / arithmeticMean Interpretation: flatness ≈ 1.0: White noise (perfectly flat spectrum) flatness ≈ 0.0: Pure tone (single frequency) Typical values: 0.1-0.8 for most sounds Use in modulation: Higher flatness → deeper modulation (noisy sounds) Lower flatness → subtler modulation (tonal sounds)

Flatness Interpretation Guide

Flatness Ranges and Meanings:
0.00-0.10: Pure tones, harmonic complexes
0.10-0.30: Vocal vowels, sustained instruments
0.30-0.50: Mixed sources, speech consonants
0.50-0.70: Noise with some structure
0.70-1.00: White noise, unpitched percussion

Modulation Response:
Low flatness (<0.3): Gentle modulation (20-30 dB depth)
Medium flatness (0.3-0.6): Moderate modulation (30-40 dB depth)
High flatness (>0.6): Strong modulation (40-50 dB depth)

Spectral Roughness Measurement

Rapid Change Detection

Spectral roughness calculation:

Spectral Roughness = Average Local Spectral Variation Calculation: FOR each frequency bin FROM 80 Hz TO 5000 Hz: IF bin > 1 AND bin < nBins: power_current = re² + im² power_prev = previous_bin_power power_next = next_bin_power local_smooth = (power_prev + power_next) / 2 deviation = abs(power_current - local_smooth) roughnessSum = roughnessSum + deviation roughnessBins = roughnessBins + 1 roughness = roughnessSum / roughnessBins Interpretation: Low roughness: Smooth spectrum (steady tones) High roughness: Jagged spectrum (noise, transients) Typical values: 0.01-0.10 for most sounds Use in modulation: Higher roughness → faster modulation (complex sounds) Lower roughness → slower modulation (simple sounds)

Roughness Interpretation Guide

Roughness Ranges and Meanings:
0.00-0.02: Very smooth spectra (sine waves)
0.02-0.04: Smooth spectra (sustained instruments)
0.04-0.06: Moderate variation (speech, mixed sources)
0.06-0.08: High variation (noise, consonants)
0.08-0.10+: Extreme variation (transients, distortion)

Modulation Response:
Low roughness (<0.04): Slow modulation (1.0-2.5 Hz)
Medium roughness (0.04-0.07): Medium modulation (2.5-4.0 Hz)
High roughness (>0.07): Fast modulation (4.0-5.0 Hz)

Analysis Output Example

Typical Info Window Output:

=== ANALYZING 8 TIME WINDOWS ===
Window 1 (0.10s) - Flatness: 0.215, Roughness: 0.031
Window 2 (1.25s) - Flatness: 0.342, Roughness: 0.045
Window 3 (2.50s) - Flatness: 0.518, Roughness: 0.062
Window 4 (3.75s) - Flatness: 0.287, Roughness: 0.038
Window 5 (5.00s) - Flatness: 0.631, Roughness: 0.071
Window 6 (6.25s) - Flatness: 0.453, Roughness: 0.055
Window 7 (7.50s) - Flatness: 0.389, Roughness: 0.049
Window 8 (8.65s) - Flatness: 0.276, Roughness: 0.035

Interpretation:
- Starts tonal, becomes noisier around 2.5-5.0s
- Returns to more tonal character at end
- Roughness follows similar pattern

Intensity Modulation Phase

Parameter Interpolation

🔄 Smooth Parameter Transitions

Concept: Continuous evolution between analysis points

Technique: Linear interpolation of flatness/roughness values

Resolution: 10ms grid (100 points per second)

Result: Smooth, evolving modulation parameters

Interpolation Process:

SETUP: timeStep = 0.01 // 10ms resolution numGridPoints = round(duration / timeStep) + 1 Create IntensityTier: "spectral_intensity", 0, duration FOR each grid point i FROM 1 TO numGridPoints: currentTime = (i-1) × timeStep // Find current segment between analysis points FOR p FROM 1 TO 7: // 8 points → 7 segments IF currentTime BETWEEN analysisTimes#[p] AND analysisTimes#[p+1] segment = p progress = (currentTime - analysisTimes#[p]) / (analysisTimes#[p+1] - analysisTimes#[p]) // Interpolate spectral features currentFlatness = flatness#[p] + progress × (flatness#[p+1] - flatness#[p]) currentRoughness = roughness#[p] + progress × (roughness#[p+1] - roughness#[p]) // Handle boundaries (before first/after last analysis point)

Modulation Parameter Calculation

Dynamic Depth and Speed

Modulation parameter formulas:

Base Formulas: intensityDepth = 20 + (currentFlatness × 30) // dB modulationSpeed = 1.0 + (currentRoughness × 4.0) // Hz Special Case Handling: IF currentFlatness < 0.3 AND currentRoughness < 0.02: // Very tonal, smooth sounds intensityDepth = intensityDepth × 0.3 // Reduce depth modulationSpeed = modulationSpeed × 0.7 // Slow down Safety Limits: currentIntensity = 70 + intensityDepth × sin(currentPhase) currentIntensity = min(max(currentIntensity, 40), 100) // 40-100 dB range Parameter Ranges: Intensity depth: 20-50 dB (6-16 dB after special case) Modulation speed: 1.0-5.0 Hz (0.7-3.5 Hz after special case) Base intensity: 70 dB (modulated ± depth)

Phase-Accurate Modulation

Continuous phase calculation:

Phase Calculation: IF i = 1 (first point): currentPhase = 0 ELSE: timeDelta = currentTime - previousTime phaseDelta = 2 × π × modulationSpeed × timeDelta currentPhase = currentPhase + phaseDelta Intensity Calculation: intensityVariation = intensityDepth × sin(currentPhase) currentIntensity = 70 + intensityVariation Why phase continuity matters: Prevents phase jumps in modulation Ensures smooth sine wave progression Maintains musical rhythm when speed changes

Intensity Tier Generation

High-Resolution Control

Intensity tier creation:

// Create empty intensity tier intensityTier = Create IntensityTier: "spectral_intensity", 0, duration FOR each grid point i FROM 1 TO numGridPoints: // Calculate current time currentTime = (i-1) × timeStep // Calculate spectral features (interpolated) currentFlatness = ... [interpolation] currentRoughness = ... [interpolation] // Calculate modulation parameters intensityDepth = 20 + (currentFlatness × 30) modulationSpeed = 1.0 + (currentRoughness × 4.0) // Apply special case reduction if needed IF currentFlatness < 0.3 AND currentRoughness < 0.02: intensityDepth = intensityDepth × 0.3 modulationSpeed = modulationSpeed × 0.7 // Calculate phase and intensity IF i > 1: timeDelta = currentTime - previousTime phaseDelta = 2 × π × modulationSpeed × timeDelta currentPhase = currentPhase + phaseDelta ELSE: currentPhase = 0 intensityVariation = intensityDepth × sin(currentPhase) currentIntensity = 70 + intensityVariation currentIntensity = min(max(currentIntensity, 40), 100) // Add point to intensity tier Add point: currentTime, currentIntensity previousTime = currentTime // Progress reporting every 100 points IF i mod 100 = 0: appendInfoLine: "Processed ", i, "/", numGridPoints, " intensity points"

Final Application

Sound Modification

Applying the intensity modulation:

// Create working copy of original sound workingSound = Copy: "working_" + originalName$ // Apply intensity modulation Select: workingSound, intensityTier finalSound = Multiply: "yes" // Rename result Rename: "spectral_intensity_mod_" + originalName$ // Cleanup intermediate objects Select: workingSound, intensityTier Remove // Select final result Select: finalSound

Output Summary

Final Info Window Summary:

=== COMPLETE ===
Final sound: spectral_intensity_mod_originalname

=== INTENSITY MODULATION PARAMETER RANGES ===
Intensity depth: 20-50 dB (higher = more volume variation)
Modulation speed: 1.0-5.0 Hz (higher = faster volume changes)
Pattern: Sine wave modulation (smooth volume oscillations)

What this means:
- Volume varies by 20-50 dB around 70 dB baseline
- Modulation rate changes between 1-5 times per second
- Smooth sine wave pattern avoids abrupt changes
- Parameters evolve based on spectral content

Technical Theory

Spectral Analysis Mathematics

Fast Fourier Transform Foundation

FFT analysis for spectral features:

Window Extraction: windowDuration = 0.2 seconds (200ms) Hamming window: w(n) = 0.54 - 0.46 × cos(2πn/(N-1)) FFT Calculation: X(k) = Σ[n=0 to N-1] x(n) × w(n) × e^(-j2πkn/N) Where: x(n) = time domain samples w(n) = Hamming window function X(k) = complex frequency domain representation N = window size in samples Power Spectrum: P(k) = |X(k)|² = Re(X(k))² + Im(X(k))² Frequency Binning: freq_bin(k) = k × sampling_rate / N Analysis range: 80 Hz to 5000 Hz

Spectral Feature Mathematics

Mathematical foundations of features:

Spectral Flatness: Let P = {P₁, P₂, ..., Pₙ} be power values in analysis band Geometric Mean: GM = (Π[i=1 to n] P_i)^(1/n) Arithmetic Mean: AM = (Σ[i=1 to n] P_i) / n Flatness = GM / AM Equivalent to: exp( (1/n) × Σ ln(P_i) ) / ( (1/n) × Σ P_i ) Properties: - 0 ≤ Flatness ≤ 1 - Flatness = 1 for white noise - Flatness → 0 for pure tones Spectral Roughness: Roughness = (1/m) × Σ[i=2 to n-1] |P_i - (P_{i-1} + P_{i+1})/2| Where m = number of valid interior bins Measures local spectral variation High for noisy/transient sounds Low for smooth/harmonic sounds

Modulation Mathematics

Sine Wave Modulation Theory

Time-varying sine wave modulation:

Basic Sine Modulation: I(t) = I₀ + A × sin(2πf × t + φ) Where: I(t) = intensity at time t (dB) I₀ = base intensity (70 dB) A = modulation depth (20-50 dB) f = modulation frequency (1.0-5.0 Hz) φ = phase angle Time-Varying Parameters: A(t) = 20 + 30 × flatness(t) f(t) = 1.0 + 4.0 × roughness(t) Phase Continuity: φ(t) = φ(t-Δt) + 2π × f(t) × Δt Ensures smooth transitions when f(t) changes Prevents discontinuities in modulation waveform

Psychoacoustic Considerations

Perception of Intensity Modulation

Human perception of tremolo effects:

Modulation Speed Ranges (1.0-5.0 Hz):
1.0-2.0 Hz: Slow, obvious pulsations
2.0-3.5 Hz: Medium, musical tremolo
3.5-5.0 Hz: Fast, intense modulation
>5.0 Hz: Approaches vibrato perception

Modulation Depth Ranges (20-50 dB):
20-30 dB: Subtle, background effect
30-40 dB: Noticeable, musical effect
40-50 dB: Dramatic, foreground effect

Special Case Handling:
Very tonal + smooth sounds → reduced modulation
Prevents unnatural-sounding effects on pure tones

Computational Complexity

Processing Time Analysis

Algorithm complexity breakdown:

Spectral Analysis Phase: Operations = 8 windows × (FFT + feature calculation) FFT complexity: O(N log N) per window Feature calculation: O(N) per window Total: ~8 × O(N log N) Modulation Generation Phase: Operations = numGridPoints × interpolation × modulation numGridPoints = duration / 0.01 Total: O(T/0.01) ≈ O(100 × T) where T = duration Memory Usage: Moderate: stores 8 analysis points + intensity tier Intensity tier size: ~100 × duration points Typical Performance: 1-minute audio: ~10-30 seconds processing 5-minute audio: ~1-3 minutes processing Scales linearly with duration

Applications

Sound Design and Music Production

Use case: Creating evolving, organic modulation effects

Technique: Process individual tracks or complete mixes

Examples: Dynamic tremolo, breathing pads, evolving textures

Audio Restoration and Enhancement

Use case: Adding life to static or dull recordings

Technique: Subtle modulation on vocals or instruments

Applications: Vintage recording enhancement, mono source spatialization

Experimental and Electronic Music

Use case: Generating complex, algorithmically-driven effects

Technique: Extreme settings on synthetic sounds

Results: Unpredictable, evolving modulation patterns

Psychoacoustic Research

Use case: Studying perception of dynamic intensity changes

Technique: Controlled modulation based on spectral features

Applications: Hearing research, audio perception studies

Practical Workflow Examples

🎵 Vocal Enhancement

Goal: Add natural-sounding dynamics to vocal tracks

Source: Monophonic vocal recording

Expected Result:

Vowel sections: subtle, slow modulation
Consonant sections: stronger, faster modulation
Natural breathing effect throughout

Tip: Use moderate-length phrases for best results

🎹 Synthetic Texture Creation

Goal: Create evolving textures from static synth sounds

Source: Sustained synthesizer pad or drone

Expected Result:

Harmonic content: gentle pulsation
Noisy content: intense modulation
Evolving pattern as sound changes

Tip: Layer multiple processed versions

🥁 Percussive Processing

Goal: Add dynamic interest to drum loops

Source: Drum loop or percussion track

Expected Result:

Kick/snare transients: minimal modulation
Cymbal sustain: strong modulation
Rhythmic pulsation pattern

Tip: Process individual drum hits separately

Advanced Techniques

Creative processing chains:

Layered processing: Apply multiple times with different source material
Selective processing: Process only specific frequency ranges
Reverse processing: Apply to reversed audio for unusual effects
Parameter extraction: Use the analysis data for other purposes

Experiment with unconventional source material for unique results

Analysis data utilization:

Flatness data: Use for noise reduction decisions
Roughness data: Use for transient detection
Time evolution: Study spectral changes over time
Comparative analysis: Compare different sounds' spectral characteristics

Troubleshooting Common Issues

Problem: Modulation too subtle
Cause: Source material too tonal and smooth
Solution: Use more spectrally varied source material

Problem: Modulation too extreme
Cause: Source material very noisy and complex
Solution: Use more tonal source material, or mix with dry signal

Problem: Processing very slow
Cause: Very long audio file
Solution: Process shorter segments, or accept longer wait time

Problem: Unnatural-sounding results
Cause: Extreme spectral variations in source
Solution: Use more consistent source material, or reduce modulation strength

Technical Reference

Parameter Ranges and Limits

Parameter	Range	Default	Description
Analysis Windows	8 fixed	8	Number of spectral analysis points
Window Duration	200ms fixed	0.2s	Analysis window length
Frequency Range	80-5000Hz	80-5000Hz	Spectral analysis band
Time Resolution	10ms fixed	0.01s	Intensity point spacing
Base Intensity	70dB fixed	70dB	Center point for modulation
Intensity Depth	20-50dB	dynamic	Modulation amplitude range
Modulation Speed	1.0-5.0Hz	dynamic	Oscillation frequency range
Safety Limits	40-100dB	40-100dB	Output intensity boundaries

Algorithm Performance Characteristics

Computational Complexity:
Spectral Analysis: O(N log N) per window × 8 windows
Modulation Generation: O(T/0.01) where T = duration
Memory Usage: O(T/0.01) for intensity tier

Typical Processing Times:
30-second audio: 5-15 seconds
2-minute audio: 20-60 seconds
5-minute audio: 1-3 minutes
10-minute audio: 2-6 minutes

Quality Factors:
FFT resolution: Determined by 200ms window
Time resolution: 10ms (adequate for 5Hz modulation)
Interpolation: Linear (smooth enough for audio)

Spectral Feature Interpretation Guide

Sound Type	Flatness Range	Roughness Range	Modulation Result
Pure tones	0.00-0.10	0.00-0.02	Very subtle, slow
Vocal vowels	0.10-0.25	0.02-0.04	Subtle, musical
Sustained instruments	0.15-0.35	0.03-0.05	Moderate, evolving
Speech consonants	0.40-0.70	0.05-0.08	Strong, fast
Noise sources	0.70-0.95	0.06-0.10	Intense, rapid
Transients	0.30-0.60	0.08-0.15	Very fast, dramatic