LPC Voice Generator — User Guide

Source-filter resynthesis using Linear Predictive Coding: extract pitch and spectral envelope from voice, generate synthetic glottal pulse train, apply vocal tract filtering for robotic yet intelligible speech.

Author: Shai Cohen Affiliation: Department of Music, Bar-Ilan University, Israel Version: 0.1 (2025) License: MIT License Repo: https://github.com/ShaiCohen-ops/Praat-plugin_AudioTools
Contents:

What this does

This script performs source-filter resynthesis using Linear Predictive Coding (LPC) — the fundamental technique behind vocoders, speech synthesis, and vocal transformation. It separates voice into two components: (1) the source (pitch/glottal pulses from vocal cords), and (2) the filter (spectral envelope/formants from vocal tract shape). The script extracts both, generates a clean synthetic pulse train matching the original pitch, then filters it through the original vocal tract characteristics. Result: robotic, synthesized, yet intelligible speech that preserves linguistic content while replacing natural vocal quality with mechanical precision. This is the technology behind classic vocoders, Stephen Hawking's voice synthesizer, and Daft Punk's robotic vocals.

Key Features:

What is Source-Filter Theory? Human voice production: vocal cords vibrate (source) → create buzzy tone with fundamental + harmonics → sound travels through vocal tract (throat, mouth, nose) → tract acts as acoustic filter → certain frequencies amplified (formants), others attenuated → result = vowels, consonants. Source-filter model: Voice = Source × Filter. Source = pitch + periodicity. Filter = vocal tract shape (formants). LPC extracts the filter. We replace natural source (breathy, irregular glottal pulses) with synthetic source (clean periodic pulses) → robotic voice with preserved linguistic content. This is why synthesized speech sounds "robotic" but still intelligible — formants (linguistic information) preserved, but source quality (human vocal warmth) replaced.

Technical Implementation: (1) Extract pitch from original voice using autocorrelation (75-600 Hz range for human voice), (2) Smooth pitch to remove micro-variations and octave jumps, (3) Convert to PitchTier for resynthesis, (4) Generate synthetic glottal pulse train: periodic pulses at extracted pitch frequency, with vocal cord simulation parameters (flow/collision/power), (5) Extract spectral envelope from original using LPC (Linear Predictive Coding): 44 coefficients, 25ms analysis window, 5ms time step, 50 Hz pre-emphasis, (6) Filter synthetic source through LPC filter → applies original formant structure to clean pulses, (7) Scale intensity to 70 dB for consistent output level. Key insight: By replacing natural glottal source with synthetic one while preserving vocal tract filter, we create voice that is recognizable and intelligible but mechanically generated — the essence of vocoding and speech synthesis.

Quick start

  1. In Praat, select exactly one Sound object containing voice.
  2. Run script…LPC Voice Generator.praat.
  3. No parameter dialog — script runs automatically with optimal defaults.
  4. Processing takes 5-15 seconds depending on file length.
  5. Result auto-plays: synthesized voice with robotic quality, intelligible speech.
Quick tip: Works best on clean, close-miked vocal recordings. Single-speaker preferred (multi-speaker may confuse pitch tracking). Spoken voice ideal; singing works but pitch smoothing may flatten expressive vibrato. Processing relatively fast (LPC efficient algorithm). Result named "voice_synthesized" — compare with original to hear source-filter separation. Script cleans up intermediate objects automatically.
Important: Input MUST contain voice/speech. Script designed for vocal material — will fail or produce nonsense on non-vocal sounds (instruments, noise, silence). Pitch tracking requires periodic signal with fundamental in 75-600 Hz range (human voice range). Very noisy recordings may produce erratic pitch tracking. Whispering (no pitch) will fail. Multiple overlapping speakers may confuse analysis. Best results: solo voice, clear recording, minimal background noise, normal speaking or singing pitch range.

LPC & Source-Filter Theory

🎙️ Voice Production Model

Vocal cords: Vibrate to create periodic pulses (glottal source)

Vocal tract: Throat/mouth/nose cavity acts as resonant filter

Formants: Resonant peaks in spectrum determined by tract shape

Key insight: Voice = Source × Filter — two independent components

The Source-Filter Model

Biological Voice Production

Step 1: Source Generation (Vocal Cords)

Lungs push air → vocal cords vibrate → glottal pulses Glottal pulse train: ↑ ↑ ↑ ↑ ↑ (periodic pulses) 100 Hz fundamental = 100 pulses/second Spectrum: fundamental + harmonics (buzz tone) f₀ = 100 Hz (fundamental) 2f₀ = 200 Hz (2nd harmonic) 3f₀ = 300 Hz (3rd harmonic) ... up to 4-5 kHz

Step 2: Filtering (Vocal Tract)

Pulse train enters vocal tract → resonant filtering Vocal tract = tube with variable shape Different shapes = different resonances (formants) Example: vowel /a/ (as in "father") F1 ≈ 700 Hz (first formant) F2 ≈ 1200 Hz (second formant) F3 ≈ 2500 Hz (third formant) Filter amplifies harmonics near formant frequencies Filter attenuates harmonics between formants Result: vowel /a/ with characteristic timbre

Step 3: Output (Speech)

Filtered glottal pulses = speech sound Same pitch (source), different vowels (filter): /i/ (beet): F1≈270Hz, F2≈2300Hz → high, bright /a/ (bot): F1≈700Hz, F2≈1200Hz → open, central /u/ (boot): F1≈300Hz, F2≈870Hz → low, dark Changing pitch (source) → intonation, melody Changing formants (filter) → vowels, consonants

Linear Predictive Coding (LPC)

What LPC Does

LPC mathematically models the vocal tract as an all-pole filter — a digital filter characterized by resonant peaks (poles) that represent formants.

LPC Analysis Process:

  1. Take short window of speech (25ms — approximately stationary)
  2. Calculate autocorrelation (self-similarity at different time lags)
  3. Solve Levinson-Durbin recursion → LPC coefficients
  4. Coefficients define filter that predicts future samples from past
  5. Filter represents vocal tract transfer function (spectral envelope)

Why "Linear Predictive"?

LPC assumes each sample is linear combination of previous samples: x[n] ≈ a₁·x[n-1] + a₂·x[n-2] + ... + aₚ·x[n-p] where: a₁, a₂, ..., aₚ = LPC coefficients p = prediction order (typically 10-50) Prediction error = x[n] - predicted_x[n] LPC finds coefficients that minimize prediction error Error signal ≈ glottal source (excitation) Prediction ≈ vocal tract filter response

LPC Parameters in Script

Prediction order: 44

Window length: 0.025 seconds (25ms)

Time step: 0.005 seconds (5ms)

Pre-emphasis: 50 Hz

Pitch Extraction & Smoothing

Pitch Detection (Autocorrelation Method)

Range: 75-600 Hz

Autocorrelation principle:

For periodic signal (voice), autocorrelation peaks at period Glottal pulses: ↑ ↑ ↑ ↑ (period = 10ms = 100Hz) Autocorrelation: lag 0ms: peak (signal matches itself perfectly) lag 10ms: peak (signal matches one period later) lag 20ms: peak (two periods) ... Highest peak (excluding 0) → period → f₀ = 1/period

Pitch Smoothing (10 Hz bandwidth)

Why smooth:

10 Hz bandwidth:

Effect on different material:

InputEffect of Smoothing
SpeechRemoves natural jitter → cleaner, more synthetic
Singing (vibrato)Reduces vibrato width → flatter, less expressive
MonotoneMinimal effect (already stable)
Pitched shoutStabilizes erratic pitch → unnaturally steady

Glottal Source Synthesis

To Sound (phonation) Parameters

The script generates synthetic glottal pulses with specific characteristics:

Sample rate: 44100 Hz

Adaptation factor: 1

Maximum period: 0.05 seconds

Open phase: 0.7

Collision phase: 0.03

Power 1: 3, Power 2: 4

Flow/collision: "no" (no flutter)

Resulting Glottal Spectrum

Synthetic glottal source spectrum: Amplitude │ │\ │ \ │ \_______________ (falling with frequency) │ └─────────────────── Frequency f₀ 2f₀ 3f₀ 4f₀ 5f₀... Characteristics: - Fundamental + harmonics (periodic) - Spectral tilt: -12 dB/octave (typical) - Perfectly regular spacing (no jitter) - Clean (no breathiness/noise) - Ready for LPC filtering

LPC Filtering Process

Operation: LPC + Source → Filtered Output

Input: Synthetic glottal pulses (flat harmonics) Filter: LPC coefficients (formant structure from original) Output: Synthetic voice (robotic but intelligible) Mathematical operation: output[n] = source[n] + Σ(aᵢ · output[n-i]) This is an IIR (Infinite Impulse Response) filter - Recursive: output depends on previous outputs - All-pole: only poles (resonances), no zeros - Poles = formants (peaks in frequency response)

Why it works:

  1. LPC extracted formant locations from original voice
  2. Formants = linguistic information (vowel/consonant identity)
  3. Applying same formants to clean source → preserves linguistics
  4. But source is synthetic → sounds robotic, not human
  5. Result: intelligible but mechanical speech

Comparison: Natural vs Synthesized

AspectNatural VoiceLPC Synthesis
SourceIrregular glottal pulses, jitter, breathinessPerfect periodic pulses
FilterFormants from actual vocal tractSame formants (from LPC)
PitchNatural vibrato, micro-variationsSmoothed, stable contour
TimbreWarm, breathy, human qualitySynthetic, buzzy, clean
Intelligibility100% (if clear recording)~95% (formants preserved)
Emotional qualityFull emotional expressionFlat, robotic, neutral
NaturalnessCompletely naturalObviously synthetic
Historical Context: LPC developed in 1960s-70s for speech compression (telephone, military communications). Pioneering work: Bishnu S. Atal and Manfred R. Schroeder (Bell Labs). First real-time LPC vocoders: 1970s. Stephen Hawking's voice synthesizer: DECtalk (1980s) used LPC. Music applications: Kraftwerk "Autobahn" (1974), Laurie Anderson "O Superman" (1981), Daft Punk's entire aesthetic. Modern uses: cellphone codecs (GSM, CDMA), voice assistants (underlying tech), experimental music. This script makes classic LPC vocoding accessible — what required custom hardware in 1970s now runs as Praat script. Source-filter model remains fundamental to speech science and synthesis.

Parameters

No User Parameters: This script runs with hardcoded optimal defaults — no dialog box appears. All parameters scientifically chosen for speech analysis. To modify behavior, edit script source code directly. Below documents the internal parameters for advanced users.

Pitch Extraction Parameters (To Pitch)

ParameterValueDescription
Time step0 (auto)Automatic step size (typically 0.01s)
Pitch floor75 HzMinimum detectable pitch (low male voice)
Pitch ceiling600 HzMaximum detectable pitch (high female/child)

Pitch Smoothing Parameters

ParameterValueDescription
Bandwidth10 HzSmoothing bandwidth (removes jitter, reduces vibrato)

Glottal Source Parameters (To Sound phonation)

ParameterValueDescription
Sample rate44100 HzAudio sample rate
Adaptation factor1Spectral tilt (1 = natural)
Maximum period0.05 sLowest pitch = 20 Hz
Open phase0.770% of cycle cords open
Collision phase0.033% of cycle in collision
Power 13Flow derivative shape
Power 24Flow derivative shape
FlutternoNo random jitter (perfectly periodic)

LPC Analysis Parameters (To LPC autocorrelation)

ParameterValueDescription
Prediction order44Number of LPC coefficients
Analysis width0.025 sWindow length (25ms)
Time step0.005 sHop size between analyses (5ms)
Pre-emphasis50 HzHigh-pass frequency for analysis

LPC Filtering Parameters

ParameterValueDescription
Inverse filternoApply filter (not inverse)

Output Parameters

ParameterValueDescription
Target intensity70 dBOutput level normalization
Auto-playyesPlay result immediately
CleanupyesRemove intermediate objects

Modifying Parameters (Advanced)

To change behavior, edit script lines:

Change Pitch Range

To Pitch: 0, 75, 600
# Change 75 (floor) and 600 (ceiling)
# Example for low male: 50, 300
# Example for child: 150, 800

Change Smoothing

Smooth: 10
# Lower value = less smoothing (more natural jitter)
# Higher value = more smoothing (flatter pitch)
# Example for less smoothing: 5
# Example for more smoothing: 20

Change LPC Order

To LPC (autocorrelation): 44, 0.025, 0.005, 50
# First number = prediction order
# Higher = more formants tracked
# Example for simpler model: 24
# Example for detailed model: 64

Add Natural Flutter

To Sound (phonation): 44100, 1, 0.05, 0.7, 0.03, 3, 4, "no"
# Change "no" to "yes