Noise Vocoder — User Guide
Multi-band spectral analysis-resynthesis: extracts spectral envelope from source audio via Bark-scale filterbank, applies envelope to noise carrier for robotic, whispered, or synthetic vocal textures.
What this does
This script implements a noise vocoder — a classic audio effect that analyzes the spectral envelope (formant structure) of a source sound and imposes it onto a noise carrier. The source is divided into frequency bands (default 16) spaced according to the Bark scale (perceptually uniform). For each band: (1) Extract intensity contour from source (envelope tracking), (2) Generate white noise and filter to same band, (3) Multiply noise by intensity contour (amplitude modulation), (4) Sum all bands. Result: robotic, whispered speech; synthetic vocal textures; gender-neutral voices; privacy-preserving speech (intelligible but unrecognizable); experimental sound design. Unlike pitch-tracking vocoders, this uses noise only = pitch-invariant, creating characteristic "whispered robot" sound.
Key Features:
- Bark-Scale Filterbank — Perceptually uniform frequency division (16-32 bands typical)
- Intensity Envelope Tracking — Captures temporal dynamics per band
- Noise Carrier — White noise replaces harmonic excitation
- Band Summation — Reconstructs full spectrum from modulated bands
- 5 Presets — Default, More Bands, Wider Range, Stronger Noise, Smoother Filter
- Adjustable Parameters — Band count, frequency range, smoothing, noise level
Technical Implementation: (1) Bark scale band calculation: Convert frequency limits to Bark scale (perceptually linear), Divide range into number_of_bands equal steps in Bark, Convert back to Hz for each band boundary, Creates narrower bands at low freq, wider at high freq (matches ear sensitivity). (2) Per-band processing (iterative): Filter source audio to band (Hann band-pass), Extract RMS and intensity contour (temporal envelope), Generate white noise at full duration, Filter noise to same band, Multiply noise by intensity contour (AM), Scale to match source RMS (amplitude calibration), Add to previous bands (accumulation). (3) Output: Sum of all modulated noise bands = vocoded result. Key insight: Spectral envelope preserved, pitch removed. Intelligibility maintained (formants carry phonetic info) but voice identity lost. Processing time: 2-10 seconds per band (total time = bands × duration dependency).
Quick start
- In Praat, select exactly one Sound object (preferably voice or pitched instrument).
- Run script… →
Vocoding.praat. - Choose Preset: Default (16 bands), More Bands (24), Wider Frequency Range, Stronger Noise, or Smoother Filter.
- If Custom: adjust number_of_bands (8-32), frequency limits (50-11000 Hz), filter_smoothing, noise_amplitude.
- Click OK — processing analyzes bands, displays progress in Info window, auto-plays vocoded result.
Vocoder Theory
Source-Filter Model of Sound
Fundamental Concept
Any sound = Source × Filter
Vocoder Operation
Analysis stage (modulator):
- Divide spectrum into bands
- Extract envelope (loudness over time) per band
- Result: spectral envelope = filter characteristics
Synthesis stage (carrier + envelopes):
- Generate carrier signal (noise, tone, or audio)
- Divide carrier into same bands
- Modulate each band by extracted envelope
- Sum all bands = reconstructed sound
This script: noise carrier = replaces pitch with noise
Bark Scale Filterbank
Why Bark Scale?
Problem with linear Hz spacing:
- Human hearing not linear in frequency
- More sensitive to changes below 1000 Hz
- Less sensitive above 1000 Hz
- Linear bands waste resolution at high freq, underresolve low freq
Bark scale solution:
- Perceptually uniform scale (1 Bark ≈ one critical band)
- Linear in Bark = perceptually equal steps
- Results in narrower Hz bands at low freq, wider at high freq
- Matches ear's frequency resolution
Band Distribution Example
Intensity Envelope Extraction
What is Intensity?
Intensity = perceptual loudness measure (dB SPL)
- Not instantaneous amplitude (too fast, includes pitch oscillations)
- Smoothed energy over short windows (~10-100 ms)
- Captures envelope = slow amplitude changes
- Represents "how loud" over time, ignoring "what pitch"
Why This Preserves Intelligibility
Speech intelligibility depends on:
- Formants: Spectral envelope peaks (vowel identity)
- Temporal envelope: Amplitude changes (consonant onsets, rhythm)
- Not pitch: Fundamental frequency less critical for understanding
Vocoder preserves:
- ✓ Formant structure (via band envelopes)
- ✓ Temporal dynamics (via intensity contours)
- ✗ Pitch information (replaced by noise)
Result: Whispered but intelligible speech
Noise Carrier Generation
White Noise Characteristics
Properties:
- Equal power at all frequencies (flat spectrum)
- Random amplitude values (Gaussian distribution)
- No periodic structure (unpitched)
- Provides neutral excitation for filtering
Why Noise (Not Other Carriers)?
Noise advantages:
- No inherent pitch = neutral
- Broadband = provides energy for all band filters
- Simple to generate (no pitch tracking needed)
- Creates "whispered" character (unvoiced)
Alternative carriers (not this script):
- Pitched tone: Singing robot (requires pitch tracking)
- Another audio: Cross-synthesis (modulator A, carrier B)
- Mixed: Pitch + noise (voiced/unvoiced distinction)
Band Summation and Reconstruction
Additive Synthesis
Principle: Filtered bands are linearly combined
Amplitude Calibration
RMS matching per band:
Complete Processing Pipeline
Comparison to Other Speech Effects
| Effect | Method | Character | Intelligibility |
|---|---|---|---|
| Noise Vocoder | Envelope + noise | Whispered robot | High (formants preserved) |
| Pitch Vocoder | Envelope + tone | Singing robot | Very high |
| Whisper | Unvoiced excitation | Natural whisper | Moderate (no pitch cues) |
| Bitcrusher | Resolution reduction | Digital/lo-fi | Variable (artifact-dependent) |
| Ring Modulator | Frequency shifting | Metallic/inharmonic | Low (spectrum distorted) |
| Heavy EQ | Frequency filtering | Telephone/radio | High (if mids preserved) |
Parameters & Presets
Preset Options
🎵 Default
Parameters: 16 bands, 50-11000 Hz, smoothing 50 Hz, noise 0.1
Character: Balanced vocoder, classic robotic speech
Best for: General use, speech vocoding, experimenting
📊 More Bands
Parameters: 24 bands (other defaults)
Character: Higher resolution, smoother, more intelligible
Best for: Clear speech, minimal artifacts, quality priority
🌊 Wider Frequency Range
Parameters: 20-15000 Hz (vs 50-11000 Hz default)
Character: Full spectrum, includes sub-bass and air
Best for: Music vocoding, full-bandwidth processing
💨 Stronger Noise
Parameters: Noise amplitude 0.2 (vs 0.1 default)
Character: More prominent "hiss," breathier quality
Best for: Emphasizing noise carrier, experimental textures
✨ Smoother Filter
Parameters: Smoothing 100 Hz (vs 50 Hz default)
Character: Blended bands, less isolation, softer edges
Best for: Reducing harshness, subtle vocoding
Custom Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
| preset | option | Default | Choose preset configuration |
| number_of_bands | natural | 16 | Filterbank resolution (8-32 practical) |
| lower_frequency_limit | positive | 50 | Lowest band edge (Hz) |
| upper_frequency_limit | positive | 11000 | Highest band edge (Hz) |
| minimum_pitch | positive | 100 | For intensity extraction windowing |
| time_step | positive | 0.1 | Intensity contour sampling (seconds) |
| filter_smoothing | positive | 50 | Filter edge smoothness (Hz) |
| filter_edge_buffer | positive | 25 | Trim from band edges (Hz) |
| noise_amplitude | positive | 0.1 | Carrier noise level |
| play_after_processing | boolean | yes | Auto-play vocoded result |
| keep_intermediate_objects | boolean | no | Retain band objects for inspection |
Parameter Details
number_of_bands
Range: 8-32 (practical), 4-64 (extreme)
Default: 16
Effect:
- 8-12: Low resolution, robotic, obvious artifacts
- 12-20: Good balance, classic vocoder sound
- 20-28: High resolution, smooth, natural
- >28: Diminishing returns, long processing time
Trade-off: More bands = better quality but slower processing
lower_frequency_limit & upper_frequency_limit
Range: 20-20000 Hz (audible range)
Defaults: 50 Hz (bass), 11000 Hz (treble)
Effect:
- Narrow range (300-3000 Hz): Telephone bandwidth
- Speech range (80-8000 Hz): Intelligibility-focused
- Default (50-11000 Hz): Full speech + some music
- Wide range (20-15000 Hz): Full spectrum, music vocoding
filter_smoothing
Range: 10-200 Hz
Default: 50 Hz
Effect:
- Low (10-30 Hz): Sharp band edges, isolated bands, harsh
- Medium (30-70 Hz): Balanced, some overlap
- High (70-150 Hz): Smooth transitions, blended bands
- Very high (>150 Hz): Excessive overlap, muddy