Noise Vocoder — User Guide

Multi-band spectral analysis-resynthesis: extracts spectral envelope from source audio via Bark-scale filterbank, applies envelope to noise carrier for robotic, whispered, or synthetic vocal textures.

Author: Shai Cohen Affiliation: Department of Music, Bar-Ilan University, Israel Version: 0.1 (2025) License: MIT License Repo: https://github.com/ShaiCohen-ops/Praat-plugin_AudioTools
Contents:

What this does

This script implements a noise vocoder — a classic audio effect that analyzes the spectral envelope (formant structure) of a source sound and imposes it onto a noise carrier. The source is divided into frequency bands (default 16) spaced according to the Bark scale (perceptually uniform). For each band: (1) Extract intensity contour from source (envelope tracking), (2) Generate white noise and filter to same band, (3) Multiply noise by intensity contour (amplitude modulation), (4) Sum all bands. Result: robotic, whispered speech; synthetic vocal textures; gender-neutral voices; privacy-preserving speech (intelligible but unrecognizable); experimental sound design. Unlike pitch-tracking vocoders, this uses noise only = pitch-invariant, creating characteristic "whispered robot" sound.

Key Features:

What is a vocoder? Vocoder = voice encoder, developed 1930s-1940s (Homer Dudley, Bell Labs) for speech compression/encryption. Principle: separate excitation (pitch/noise source) from filter (spectral envelope/formants). Analysis: extract envelope from modulator. Synthesis: apply envelope to carrier. Types: (1) Channel vocoder: filterbank analysis (this script). (2) Phase vocoder: FFT-based (different technique). (3) LPC vocoder: Linear Predictive Coding. (4) Formant vocoder: tracks resonances directly. Musical history: 1970s electronic music (Kraftwerk, ELO), funk/disco "robot voices," modern EDM/pop vocal effects. This script: noise carrier = unvoiced/whispered character, versus pitched carrier = singing robot (Daft Punk style, requires two inputs).

Technical Implementation: (1) Bark scale band calculation: Convert frequency limits to Bark scale (perceptually linear), Divide range into number_of_bands equal steps in Bark, Convert back to Hz for each band boundary, Creates narrower bands at low freq, wider at high freq (matches ear sensitivity). (2) Per-band processing (iterative): Filter source audio to band (Hann band-pass), Extract RMS and intensity contour (temporal envelope), Generate white noise at full duration, Filter noise to same band, Multiply noise by intensity contour (AM), Scale to match source RMS (amplitude calibration), Add to previous bands (accumulation). (3) Output: Sum of all modulated noise bands = vocoded result. Key insight: Spectral envelope preserved, pitch removed. Intelligibility maintained (formants carry phonetic info) but voice identity lost. Processing time: 2-10 seconds per band (total time = bands × duration dependency).

Quick start

  1. In Praat, select exactly one Sound object (preferably voice or pitched instrument).
  2. Run script…Vocoding.praat.
  3. Choose Preset: Default (16 bands), More Bands (24), Wider Frequency Range, Stronger Noise, or Smoother Filter.
  4. If Custom: adjust number_of_bands (8-32), frequency limits (50-11000 Hz), filter_smoothing, noise_amplitude.
  5. Click OK — processing analyzes bands, displays progress in Info window, auto-plays vocoded result.
Quick tip: Start with Default (16 bands) for classic vocoder sound. Try More Bands (24) for higher intelligibility and smoother sound. Use Wider Frequency Range for full-spectrum processing (20-15000 Hz). Stronger Noise increases carrier level (more "hiss"). Smoother Filter reduces band isolation (more blending). Processing time: ~5-20 seconds depending on duration and band count (each band processed sequentially). Info window shows band boundaries in Hz during processing. Output named "originalname_vocoded". Works best on speech or sustained tones — percussive material may lose definition.
Important: LONG PROCESSING TIME — iterative per-band processing, not instant. More bands = longer processing (linear scaling). Very long files (>1 minute) may take minutes to process. Consider trimming to representative section for experimentation. Very few bands (<8) creates poor intelligibility, robotic artifacts. Very many bands (>32) increases processing time with diminishing returns. Very low noise_amplitude (<0.05) creates weak output. Very high (>0.3) overwhelms envelope, sounds like pure noise. Filter_smoothing affects band isolation: too low (<25 Hz) = harsh transitions, too high (>150 Hz) = bands overlap excessively. Best results on clear, pitched, sustained material (speech, vocals, winds). Percussive sounds lose attack transients.

Vocoder Theory

Source-Filter Model of Sound

Fundamental Concept

Any sound = Source × Filter

Source-Filter Theory (speech example): SOURCE (excitation): - Voiced sounds: vocal fold vibration (periodic, pitched) - Unvoiced sounds: turbulent air (noise) - Provides fundamental energy FILTER (vocal tract): - Resonant cavities (throat, mouth, nose) - Shapes spectrum via formants - Creates phonetic identity SOUND = Source × Filter - /a/ = buzz × /a/-shaped resonances - /s/ = noise × /s/-shaped filtering Vocoder principle: Separate source and filter, then recombine with different source

Vocoder Operation

Analysis stage (modulator):

Synthesis stage (carrier + envelopes):

This script: noise carrier = replaces pitch with noise

Bark Scale Filterbank

Why Bark Scale?

Problem with linear Hz spacing:

Bark scale solution:

🎵 Bark Scale Mathematics

Conversion formulas (approximate):

Hertz to Bark: Bark = 13 × atan(0.00076 × Hz) + 3.5 × atan((Hz/7500)²) Bark to Hertz: Hz = 1960 × (Bark + 0.53) / (26.28 - Bark) Example conversions: 100 Hz ≈ 1.0 Bark 500 Hz ≈ 4.8 Bark 1000 Hz ≈ 8.5 Bark 5000 Hz ≈ 18.5 Bark 10000 Hz ≈ 22.5 Bark Total audible range: ~24 Bark (20-16000 Hz)

Band Distribution Example

16 bands, 50-11000 Hz (Default preset): Bark range: 0.5 → 21.5 Bark (21 Bark total) Step size: 21/16 = 1.31 Bark per band Band Bark Range Hz Range Bandwidth 1 0.5-1.81 50-196 146 Hz 2 1.81-3.12 196-360 164 Hz 3 3.12-4.44 360-548 188 Hz 4 4.44-5.75 548-764 216 Hz 5 5.75-7.06 764-1012 248 Hz ... 12 15.5-16.8 2884-3580 696 Hz 13 16.8-18.1 3580-4484 904 Hz 14 18.1-19.4 4484-5659 1175 Hz 15 19.4-20.7 5659-7337 1678 Hz 16 20.7-22.0 7337-9891 2554 Hz Note: Bandwidth increases with frequency Perceptually equal spacing in Bark = unequal Hz

Intensity Envelope Extraction

What is Intensity?

Intensity = perceptual loudness measure (dB SPL)

Intensity calculation (per band): 1. Filter source to band (isolate frequency region) 2. Calculate intensity contour: - Window size: related to minimum_pitch (100 Hz) - Time step: 0.1 seconds (10 Hz sampling rate) - Result: intensity values over time 3. Convert to IntensityTier: - Time-stamped amplitude values - Interpolated between points - Used to modulate noise carrier 4. Multiply noise by intensity: - Noise amplitude × intensity_envelope - Preserves temporal dynamics - Removes source pitch, keeps envelope

Why This Preserves Intelligibility

Speech intelligibility depends on:

Vocoder preserves:

Result: Whispered but intelligible speech

Noise Carrier Generation

White Noise Characteristics

Properties:

Noise generation (per band): 1. Create noise sound: Duration: match source duration Sample rate: 2 × upper_freq + 1000 Formula: randomGauss(0, noise_amplitude) 2. Filter to band: Same frequency limits as source band Hann window filter (smooth edges) Smoothing: filter_smoothing parameter 3. Result: band-limited noise Only frequencies within band present Ready for envelope modulation

Why Noise (Not Other Carriers)?

Noise advantages:

Alternative carriers (not this script):

Band Summation and Reconstruction

Additive Synthesis

Principle: Filtered bands are linearly combined

Band accumulation process: Band 1: [modulated noise 50-196 Hz] ↓ Band 2: [modulated noise 196-360 Hz] ↓ (add to Band 1) Combined: [Bands 1+2] ↓ Band 3: [modulated noise 360-548 Hz] ↓ (add to Bands 1+2) Combined: [Bands 1+2+3] ↓ ... ↓ Band 16: [modulated noise 7337-9891 Hz] ↓ (add to Bands 1-15) FINAL: [All 16 bands summed] Output = full spectrum reconstructed from parts

Amplitude Calibration

RMS matching per band:

Problem: Noise bands may have different RMS than source Solution (per band): 1. Measure source band RMS: rms_SOURCE 2. Measure modulated noise RMS: rms_IS 3. Scale noise: noise × (rms_SOURCE / rms_IS) Result: Each band has same overall energy as source Preserves spectral balance across bands Prevents some bands from dominating

Complete Processing Pipeline

FOR EACH BAND (1 to number_of_bands): STEP 1: Calculate band limits - Convert Bark boundaries to Hz - Add edge buffer for smoothing STEP 2: Analyze source - Filter source to band - Measure RMS (overall energy) - Extract intensity contour (envelope) STEP 3: Generate carrier - Create white noise (full duration) - Filter noise to same band STEP 4: Apply envelope - Multiply noise by intensity contour - Scale to match source RMS STEP 5: Accumulate - Add to previous bands - Store as "band'i'" END FOR RESULT: Sum of all modulated bands = vocoded output

Comparison to Other Speech Effects

EffectMethodCharacterIntelligibility
Noise VocoderEnvelope + noiseWhispered robotHigh (formants preserved)
Pitch VocoderEnvelope + toneSinging robotVery high
WhisperUnvoiced excitationNatural whisperModerate (no pitch cues)
BitcrusherResolution reductionDigital/lo-fiVariable (artifact-dependent)
Ring ModulatorFrequency shiftingMetallic/inharmonicLow (spectrum distorted)
Heavy EQFrequency filteringTelephone/radioHigh (if mids preserved)

Parameters & Presets

Preset Options

🎵 Default

Parameters: 16 bands, 50-11000 Hz, smoothing 50 Hz, noise 0.1

Character: Balanced vocoder, classic robotic speech

Best for: General use, speech vocoding, experimenting

📊 More Bands

Parameters: 24 bands (other defaults)

Character: Higher resolution, smoother, more intelligible

Best for: Clear speech, minimal artifacts, quality priority

🌊 Wider Frequency Range

Parameters: 20-15000 Hz (vs 50-11000 Hz default)

Character: Full spectrum, includes sub-bass and air

Best for: Music vocoding, full-bandwidth processing

💨 Stronger Noise

Parameters: Noise amplitude 0.2 (vs 0.1 default)

Character: More prominent "hiss," breathier quality

Best for: Emphasizing noise carrier, experimental textures

✨ Smoother Filter

Parameters: Smoothing 100 Hz (vs 50 Hz default)

Character: Blended bands, less isolation, softer edges

Best for: Reducing harshness, subtle vocoding

Custom Parameters

ParameterTypeDefaultDescription
presetoptionDefaultChoose preset configuration
number_of_bandsnatural16Filterbank resolution (8-32 practical)
lower_frequency_limitpositive50Lowest band edge (Hz)
upper_frequency_limitpositive11000Highest band edge (Hz)
minimum_pitchpositive100For intensity extraction windowing
time_steppositive0.1Intensity contour sampling (seconds)
filter_smoothingpositive50Filter edge smoothness (Hz)
filter_edge_bufferpositive25Trim from band edges (Hz)
noise_amplitudepositive0.1Carrier noise level
play_after_processingbooleanyesAuto-play vocoded result
keep_intermediate_objectsbooleannoRetain band objects for inspection

Parameter Details

number_of_bands

Range: 8-32 (practical), 4-64 (extreme)

Default: 16

Effect:

Trade-off: More bands = better quality but slower processing

lower_frequency_limit & upper_frequency_limit

Range: 20-20000 Hz (audible range)

Defaults: 50 Hz (bass), 11000 Hz (treble)

Effect:

filter_smoothing

Range: 10-200 Hz

Default: 50 Hz

Effect: