LPC Voice Generator — User Guide
Source-filter resynthesis using Linear Predictive Coding: extract pitch and spectral envelope from voice, generate synthetic glottal pulse train, apply vocal tract filtering for robotic yet intelligible speech.
What this does
This script performs source-filter resynthesis using Linear Predictive Coding (LPC) — the fundamental technique behind vocoders, speech synthesis, and vocal transformation. It separates voice into two components: (1) the source (pitch/glottal pulses from vocal cords), and (2) the filter (spectral envelope/formants from vocal tract shape). The script extracts both, generates a clean synthetic pulse train matching the original pitch, then filters it through the original vocal tract characteristics. Result: robotic, synthesized, yet intelligible speech that preserves linguistic content while replacing natural vocal quality with mechanical precision. This is the technology behind classic vocoders, Stephen Hawking's voice synthesizer, and Daft Punk's robotic vocals.
Key Features:
- Source-Filter Decomposition — Separates pitch from timbre
- LPC Analysis — Extracts vocal tract spectral envelope (formants)
- Pitch Extraction & Smoothing — Tracks fundamental frequency precisely
- Synthetic Glottal Source — Generates clean pulse train
- Formant-Preserving Filtering — Maintains vowel/consonant identity
- Automatic Processing — One-click transformation
Technical Implementation: (1) Extract pitch from original voice using autocorrelation (75-600 Hz range for human voice), (2) Smooth pitch to remove micro-variations and octave jumps, (3) Convert to PitchTier for resynthesis, (4) Generate synthetic glottal pulse train: periodic pulses at extracted pitch frequency, with vocal cord simulation parameters (flow/collision/power), (5) Extract spectral envelope from original using LPC (Linear Predictive Coding): 44 coefficients, 25ms analysis window, 5ms time step, 50 Hz pre-emphasis, (6) Filter synthetic source through LPC filter → applies original formant structure to clean pulses, (7) Scale intensity to 70 dB for consistent output level. Key insight: By replacing natural glottal source with synthetic one while preserving vocal tract filter, we create voice that is recognizable and intelligible but mechanically generated — the essence of vocoding and speech synthesis.
Quick start
- In Praat, select exactly one Sound object containing voice.
- Run script… →
LPC Voice Generator.praat. - No parameter dialog — script runs automatically with optimal defaults.
- Processing takes 5-15 seconds depending on file length.
- Result auto-plays: synthesized voice with robotic quality, intelligible speech.
LPC & Source-Filter Theory
🎙️ Voice Production Model
Vocal cords: Vibrate to create periodic pulses (glottal source)
Vocal tract: Throat/mouth/nose cavity acts as resonant filter
Formants: Resonant peaks in spectrum determined by tract shape
Key insight: Voice = Source × Filter — two independent components
The Source-Filter Model
Biological Voice Production
Step 1: Source Generation (Vocal Cords)
Step 2: Filtering (Vocal Tract)
Step 3: Output (Speech)
Linear Predictive Coding (LPC)
What LPC Does
LPC mathematically models the vocal tract as an all-pole filter — a digital filter characterized by resonant peaks (poles) that represent formants.
LPC Analysis Process:
- Take short window of speech (25ms — approximately stationary)
- Calculate autocorrelation (self-similarity at different time lags)
- Solve Levinson-Durbin recursion → LPC coefficients
- Coefficients define filter that predicts future samples from past
- Filter represents vocal tract transfer function (spectral envelope)
Why "Linear Predictive"?
LPC Parameters in Script
Prediction order: 44
- Number of coefficients = complexity of filter model
- Formula: order ≈ sampling_rate / 1000 + 4
- At 44.1kHz: 44100/1000 + 4 = 48 (script uses 44 for efficiency)
- Higher order = more accurate formant tracking
- Too high = overfit, models noise instead of vocal tract
Window length: 0.025 seconds (25ms)
- Analysis window duration
- Short enough that vocal tract approximately stationary
- Long enough for reliable spectral estimation
- Standard for speech: 20-30ms
Time step: 0.005 seconds (5ms)
- Hop size between analysis windows
- Overlapping windows (25ms window, 5ms step = 80% overlap)
- Smooth temporal evolution of filter
- Captures rapid formant transitions (consonants)
Pre-emphasis: 50 Hz
- High-pass filter applied before LPC analysis
- Removes DC offset and very low frequencies
- Compensates for natural spectral tilt (voice has more low-freq energy)
- Improves formant estimation accuracy
Pitch Extraction & Smoothing
Pitch Detection (Autocorrelation Method)
Range: 75-600 Hz
- 75 Hz: lowest typical male voice fundamental
- 600 Hz: highest typical female/child voice fundamental
- Range excludes subharmonic and harmonic errors
Autocorrelation principle:
Pitch Smoothing (10 Hz bandwidth)
Why smooth:
- Raw pitch tracking has errors: octave jumps, spurious values
- Natural voice has micro-variations (jitter, vibrato)
- For synthesis, want stable, smooth pitch contour
- Smoothing = low-pass filter in frequency domain
10 Hz bandwidth:
- Allows pitch changes up to 10 Hz/second
- Removes rapid jitter (vocal cord irregularity)
- Preserves intonation (rising/falling pitch)
- Flattens extreme vibrato (deliberate musical choice)
Effect on different material:
| Input | Effect of Smoothing |
|---|---|
| Speech | Removes natural jitter → cleaner, more synthetic |
| Singing (vibrato) | Reduces vibrato width → flatter, less expressive |
| Monotone | Minimal effect (already stable) |
| Pitched shout | Stabilizes erratic pitch → unnaturally steady |
Glottal Source Synthesis
To Sound (phonation) Parameters
The script generates synthetic glottal pulses with specific characteristics:
Sample rate: 44100 Hz
- Standard audio sample rate
- Ensures high-frequency formants accurately represented
Adaptation factor: 1
- Controls spectral tilt of glottal source
- 1.0 = natural spectral tilt (realistic glottal spectrum)
- Higher = more high-frequency energy (breathier)
Maximum period: 0.05 seconds
- Lowest pitch = 1/0.05 = 20 Hz
- Safety margin (minimum pitch from pitch tracking is 75 Hz)
- Prevents errors if pitch tracker fails momentarily
Open phase: 0.7
- Fraction of glottal cycle vocal cords are open
- 0.7 = 70% open, 30% closed
- Typical for modal voice (normal speaking)
- Affects pulse shape and spectral content
Collision phase: 0.03
- Duration of vocal cord collision (as fraction of period)
- 0.03 = 3% of cycle
- Determines sharpness of glottal closure
- Sharp closure = more high-frequency energy
Power 1: 3, Power 2: 4
- Shape parameters for glottal flow waveform
- Control asymmetry of opening vs closing
- Values 3,4 = typical modal phonation
- Affect spectral richness and voice quality
Flow/collision: "no" (no flutter)
- Disables random variation in pulse timing
- Perfectly periodic pulses → maximally synthetic/robotic
- "yes" would add natural-sounding irregularity
Resulting Glottal Spectrum
LPC Filtering Process
Operation: LPC + Source → Filtered Output
Why it works:
- LPC extracted formant locations from original voice
- Formants = linguistic information (vowel/consonant identity)
- Applying same formants to clean source → preserves linguistics
- But source is synthetic → sounds robotic, not human
- Result: intelligible but mechanical speech
Comparison: Natural vs Synthesized
| Aspect | Natural Voice | LPC Synthesis |
|---|---|---|
| Source | Irregular glottal pulses, jitter, breathiness | Perfect periodic pulses |
| Filter | Formants from actual vocal tract | Same formants (from LPC) |
| Pitch | Natural vibrato, micro-variations | Smoothed, stable contour |
| Timbre | Warm, breathy, human quality | Synthetic, buzzy, clean |
| Intelligibility | 100% (if clear recording) | ~95% (formants preserved) |
| Emotional quality | Full emotional expression | Flat, robotic, neutral |
| Naturalness | Completely natural | Obviously synthetic |
Parameters
Pitch Extraction Parameters (To Pitch)
| Parameter | Value | Description |
|---|---|---|
| Time step | 0 (auto) | Automatic step size (typically 0.01s) |
| Pitch floor | 75 Hz | Minimum detectable pitch (low male voice) |
| Pitch ceiling | 600 Hz | Maximum detectable pitch (high female/child) |
Pitch Smoothing Parameters
| Parameter | Value | Description |
|---|---|---|
| Bandwidth | 10 Hz | Smoothing bandwidth (removes jitter, reduces vibrato) |
Glottal Source Parameters (To Sound phonation)
| Parameter | Value | Description |
|---|---|---|
| Sample rate | 44100 Hz | Audio sample rate |
| Adaptation factor | 1 | Spectral tilt (1 = natural) |
| Maximum period | 0.05 s | Lowest pitch = 20 Hz |
| Open phase | 0.7 | 70% of cycle cords open |
| Collision phase | 0.03 | 3% of cycle in collision |
| Power 1 | 3 | Flow derivative shape |
| Power 2 | 4 | Flow derivative shape |
| Flutter | no | No random jitter (perfectly periodic) |
LPC Analysis Parameters (To LPC autocorrelation)
| Parameter | Value | Description |
|---|---|---|
| Prediction order | 44 | Number of LPC coefficients |
| Analysis width | 0.025 s | Window length (25ms) |
| Time step | 0.005 s | Hop size between analyses (5ms) |
| Pre-emphasis | 50 Hz | High-pass frequency for analysis |
LPC Filtering Parameters
| Parameter | Value | Description |
|---|---|---|
| Inverse filter | no | Apply filter (not inverse) |
Output Parameters
| Parameter | Value | Description |
|---|---|---|
| Target intensity | 70 dB | Output level normalization |
| Auto-play | yes | Play result immediately |
| Cleanup | yes | Remove intermediate objects |
Modifying Parameters (Advanced)
To change behavior, edit script lines:
Change Pitch Range
To Pitch: 0, 75, 600 # Change 75 (floor) and 600 (ceiling) # Example for low male: 50, 300 # Example for child: 150, 800
Change Smoothing
Smooth: 10 # Lower value = less smoothing (more natural jitter) # Higher value = more smoothing (flatter pitch) # Example for less smoothing: 5 # Example for more smoothing: 20
Change LPC Order
To LPC (autocorrelation): 44, 0.025, 0.005, 50 # First number = prediction order # Higher = more formants tracked # Example for simpler model: 24 # Example for detailed model: 64
Add Natural Flutter
To Sound (phonation): 44100, 1, 0.05, 0.7, 0.03, 3, 4, "no" # Change "no" to "yes