LPC Voice Generator — User Guide

Source-filter resynthesis using Linear Predictive Coding: extract pitch and spectral envelope from voice, generate synthetic glottal pulse train, apply vocal tract filtering for robotic yet intelligible speech.

Author: Shai Cohen Affiliation: Department of Music, Bar-Ilan University, Israel Version: 0.1 (2025) License: MIT License Repo: https://github.com/ShaiCohen-ops/Praat-plugin_AudioTools

Contents:

What this does Quick start LPC & Source-Filter Theory Parameters Audio Examples Applications

What this does

This script performs source-filter resynthesis using Linear Predictive Coding (LPC) — the fundamental technique behind vocoders, speech synthesis, and vocal transformation. It separates voice into two components: (1) the source (pitch/glottal pulses from vocal cords), and (2) the filter (spectral envelope/formants from vocal tract shape). The script extracts both, generates a clean synthetic pulse train matching the original pitch, then filters it through the original vocal tract characteristics. Result: robotic, synthesized, yet intelligible speech that preserves linguistic content while replacing natural vocal quality with mechanical precision. This is the technology behind classic vocoders, Stephen Hawking's voice synthesizer, and Daft Punk's robotic vocals.

Key Features:

Source-Filter Decomposition — Separates pitch from timbre
LPC Analysis — Extracts vocal tract spectral envelope (formants)
Pitch Extraction & Smoothing — Tracks fundamental frequency precisely
Synthetic Glottal Source — Generates clean pulse train
Formant-Preserving Filtering — Maintains vowel/consonant identity
Automatic Processing — One-click transformation

What is Source-Filter Theory? Human voice production: vocal cords vibrate (source) → create buzzy tone with fundamental + harmonics → sound travels through vocal tract (throat, mouth, nose) → tract acts as acoustic filter → certain frequencies amplified (formants), others attenuated → result = vowels, consonants. Source-filter model: Voice = Source × Filter. Source = pitch + periodicity. Filter = vocal tract shape (formants). LPC extracts the filter. We replace natural source (breathy, irregular glottal pulses) with synthetic source (clean periodic pulses) → robotic voice with preserved linguistic content. This is why synthesized speech sounds "robotic" but still intelligible — formants (linguistic information) preserved, but source quality (human vocal warmth) replaced.

Technical Implementation: (1) Extract pitch from original voice using autocorrelation (75-600 Hz range for human voice), (2) Smooth pitch to remove micro-variations and octave jumps, (3) Convert to PitchTier for resynthesis, (4) Generate synthetic glottal pulse train: periodic pulses at extracted pitch frequency, with vocal cord simulation parameters (flow/collision/power), (5) Extract spectral envelope from original using LPC (Linear Predictive Coding): 44 coefficients, 25ms analysis window, 5ms time step, 50 Hz pre-emphasis, (6) Filter synthetic source through LPC filter → applies original formant structure to clean pulses, (7) Scale intensity to 70 dB for consistent output level. Key insight: By replacing natural glottal source with synthetic one while preserving vocal tract filter, we create voice that is recognizable and intelligible but mechanically generated — the essence of vocoding and speech synthesis.

Quick start

In Praat, select exactly one Sound object containing voice.
Run script… → LPC Voice Generator.praat.
No parameter dialog — script runs automatically with optimal defaults.
Processing takes 5-15 seconds depending on file length.
Result auto-plays: synthesized voice with robotic quality, intelligible speech.

Quick tip: Works best on clean, close-miked vocal recordings. Single-speaker preferred (multi-speaker may confuse pitch tracking). Spoken voice ideal; singing works but pitch smoothing may flatten expressive vibrato. Processing relatively fast (LPC efficient algorithm). Result named "voice_synthesized" — compare with original to hear source-filter separation. Script cleans up intermediate objects automatically.

Important: Input MUST contain voice/speech. Script designed for vocal material — will fail or produce nonsense on non-vocal sounds (instruments, noise, silence). Pitch tracking requires periodic signal with fundamental in 75-600 Hz range (human voice range). Very noisy recordings may produce erratic pitch tracking. Whispering (no pitch) will fail. Multiple overlapping speakers may confuse analysis. Best results: solo voice, clear recording, minimal background noise, normal speaking or singing pitch range.

LPC & Source-Filter Theory

🎙️ Voice Production Model

Vocal cords: Vibrate to create periodic pulses (glottal source)

Vocal tract: Throat/mouth/nose cavity acts as resonant filter

Formants: Resonant peaks in spectrum determined by tract shape

Key insight: Voice = Source × Filter — two independent components

The Source-Filter Model

Biological Voice Production

Step 1: Source Generation (Vocal Cords)

Lungs push air → vocal cords vibrate → glottal pulses Glottal pulse train: ↑ ↑ ↑ ↑ ↑ (periodic pulses) 100 Hz fundamental = 100 pulses/second Spectrum: fundamental + harmonics (buzz tone) f₀ = 100 Hz (fundamental) 2f₀ = 200 Hz (2nd harmonic) 3f₀ = 300 Hz (3rd harmonic) ... up to 4-5 kHz

Step 2: Filtering (Vocal Tract)

Pulse train enters vocal tract → resonant filtering Vocal tract = tube with variable shape Different shapes = different resonances (formants) Example: vowel /a/ (as in "father") F1 ≈ 700 Hz (first formant) F2 ≈ 1200 Hz (second formant) F3 ≈ 2500 Hz (third formant) Filter amplifies harmonics near formant frequencies Filter attenuates harmonics between formants Result: vowel /a/ with characteristic timbre

Step 3: Output (Speech)

Filtered glottal pulses = speech sound Same pitch (source), different vowels (filter): /i/ (beet): F1≈270Hz, F2≈2300Hz → high, bright /a/ (bot): F1≈700Hz, F2≈1200Hz → open, central /u/ (boot): F1≈300Hz, F2≈870Hz → low, dark Changing pitch (source) → intonation, melody Changing formants (filter) → vowels, consonants

Linear Predictive Coding (LPC)

What LPC Does

LPC mathematically models the vocal tract as an all-pole filter — a digital filter characterized by resonant peaks (poles) that represent formants.

LPC Analysis Process:

Take short window of speech (25ms — approximately stationary)
Calculate autocorrelation (self-similarity at different time lags)
Solve Levinson-Durbin recursion → LPC coefficients
Coefficients define filter that predicts future samples from past
Filter represents vocal tract transfer function (spectral envelope)

Why "Linear Predictive"?

LPC assumes each sample is linear combination of previous samples: x[n] ≈ a₁·x[n-1] + a₂·x[n-2] + ... + aₚ·x[n-p] where: a₁, a₂, ..., aₚ = LPC coefficients p = prediction order (typically 10-50) Prediction error = x[n] - predicted_x[n] LPC finds coefficients that minimize prediction error Error signal ≈ glottal source (excitation) Prediction ≈ vocal tract filter response

LPC Parameters in Script

Prediction order: 44

Number of coefficients = complexity of filter model
Formula: order ≈ sampling_rate / 1000 + 4
At 44.1kHz: 44100/1000 + 4 = 48 (script uses 44 for efficiency)
Higher order = more accurate formant tracking
Too high = overfit, models noise instead of vocal tract

Window length: 0.025 seconds (25ms)

Analysis window duration
Short enough that vocal tract approximately stationary
Long enough for reliable spectral estimation
Standard for speech: 20-30ms

Time step: 0.005 seconds (5ms)

Hop size between analysis windows
Overlapping windows (25ms window, 5ms step = 80% overlap)
Smooth temporal evolution of filter
Captures rapid formant transitions (consonants)

Pre-emphasis: 50 Hz

High-pass filter applied before LPC analysis
Removes DC offset and very low frequencies
Compensates for natural spectral tilt (voice has more low-freq energy)
Improves formant estimation accuracy

Pitch Extraction & Smoothing

Pitch Detection (Autocorrelation Method)

Range: 75-600 Hz

75 Hz: lowest typical male voice fundamental
600 Hz: highest typical female/child voice fundamental
Range excludes subharmonic and harmonic errors

Autocorrelation principle:

For periodic signal (voice), autocorrelation peaks at period Glottal pulses: ↑ ↑ ↑ ↑ (period = 10ms = 100Hz) Autocorrelation: lag 0ms: peak (signal matches itself perfectly) lag 10ms: peak (signal matches one period later) lag 20ms: peak (two periods) ... Highest peak (excluding 0) → period → f₀ = 1/period

Pitch Smoothing (10 Hz bandwidth)

Why smooth:

Raw pitch tracking has errors: octave jumps, spurious values
Natural voice has micro-variations (jitter, vibrato)
For synthesis, want stable, smooth pitch contour
Smoothing = low-pass filter in frequency domain

10 Hz bandwidth:

Allows pitch changes up to 10 Hz/second
Removes rapid jitter (vocal cord irregularity)
Preserves intonation (rising/falling pitch)
Flattens extreme vibrato (deliberate musical choice)

Effect on different material:

Input	Effect of Smoothing
Speech	Removes natural jitter → cleaner, more synthetic
Singing (vibrato)	Reduces vibrato width → flatter, less expressive
Monotone	Minimal effect (already stable)
Pitched shout	Stabilizes erratic pitch → unnaturally steady

Glottal Source Synthesis

To Sound (phonation) Parameters

The script generates synthetic glottal pulses with specific characteristics:

Sample rate: 44100 Hz

Standard audio sample rate
Ensures high-frequency formants accurately represented

Adaptation factor: 1

Controls spectral tilt of glottal source
1.0 = natural spectral tilt (realistic glottal spectrum)
Higher = more high-frequency energy (breathier)

Maximum period: 0.05 seconds

Lowest pitch = 1/0.05 = 20 Hz
Safety margin (minimum pitch from pitch tracking is 75 Hz)
Prevents errors if pitch tracker fails momentarily

Open phase: 0.7

Fraction of glottal cycle vocal cords are open
0.7 = 70% open, 30% closed
Typical for modal voice (normal speaking)
Affects pulse shape and spectral content

Collision phase: 0.03

Duration of vocal cord collision (as fraction of period)
0.03 = 3% of cycle
Determines sharpness of glottal closure
Sharp closure = more high-frequency energy

Power 1: 3, Power 2: 4

Shape parameters for glottal flow waveform
Control asymmetry of opening vs closing
Values 3,4 = typical modal phonation
Affect spectral richness and voice quality

Flow/collision: "no" (no flutter)

Disables random variation in pulse timing
Perfectly periodic pulses → maximally synthetic/robotic
"yes" would add natural-sounding irregularity

Resulting Glottal Spectrum

Synthetic glottal source spectrum: Amplitude │ │\ │ \ │ \_______________ (falling with frequency) │ └─────────────────── Frequency f₀ 2f₀ 3f₀ 4f₀ 5f₀... Characteristics: - Fundamental + harmonics (periodic) - Spectral tilt: -12 dB/octave (typical) - Perfectly regular spacing (no jitter) - Clean (no breathiness/noise) - Ready for LPC filtering

LPC Filtering Process

Operation: LPC + Source → Filtered Output

Input: Synthetic glottal pulses (flat harmonics) Filter: LPC coefficients (formant structure from original) Output: Synthetic voice (robotic but intelligible) Mathematical operation: output[n] = source[n] + Σ(aᵢ · output[n-i]) This is an IIR (Infinite Impulse Response) filter - Recursive: output depends on previous outputs - All-pole: only poles (resonances), no zeros - Poles = formants (peaks in frequency response)

Why it works:

LPC extracted formant locations from original voice
Formants = linguistic information (vowel/consonant identity)
Applying same formants to clean source → preserves linguistics
But source is synthetic → sounds robotic, not human
Result: intelligible but mechanical speech

Comparison: Natural vs Synthesized

Aspect	Natural Voice	LPC Synthesis
Source	Irregular glottal pulses, jitter, breathiness	Perfect periodic pulses
Filter	Formants from actual vocal tract	Same formants (from LPC)
Pitch	Natural vibrato, micro-variations	Smoothed, stable contour
Timbre	Warm, breathy, human quality	Synthetic, buzzy, clean
Intelligibility	100% (if clear recording)	~95% (formants preserved)
Emotional quality	Full emotional expression	Flat, robotic, neutral
Naturalness	Completely natural	Obviously synthetic

Historical Context: LPC developed in 1960s-70s for speech compression (telephone, military communications). Pioneering work: Bishnu S. Atal and Manfred R. Schroeder (Bell Labs). First real-time LPC vocoders: 1970s. Stephen Hawking's voice synthesizer: DECtalk (1980s) used LPC. Music applications: Kraftwerk "Autobahn" (1974), Laurie Anderson "O Superman" (1981), Daft Punk's entire aesthetic. Modern uses: cellphone codecs (GSM, CDMA), voice assistants (underlying tech), experimental music. This script makes classic LPC vocoding accessible — what required custom hardware in 1970s now runs as Praat script. Source-filter model remains fundamental to speech science and synthesis.

Parameters

No User Parameters: This script runs with hardcoded optimal defaults — no dialog box appears. All parameters scientifically chosen for speech analysis. To modify behavior, edit script source code directly. Below documents the internal parameters for advanced users.

Pitch Extraction Parameters (To Pitch)

Parameter	Value	Description
Time step	0 (auto)	Automatic step size (typically 0.01s)
Pitch floor	75 Hz	Minimum detectable pitch (low male voice)
Pitch ceiling	600 Hz	Maximum detectable pitch (high female/child)

Pitch Smoothing Parameters

Parameter	Value	Description
Bandwidth	10 Hz	Smoothing bandwidth (removes jitter, reduces vibrato)

Glottal Source Parameters (To Sound phonation)

Parameter	Value	Description
Sample rate	44100 Hz	Audio sample rate
Adaptation factor	1	Spectral tilt (1 = natural)
Maximum period	0.05 s	Lowest pitch = 20 Hz
Open phase	0.7	70% of cycle cords open
Collision phase	0.03	3% of cycle in collision
Power 1	3	Flow derivative shape
Power 2	4	Flow derivative shape
Flutter	no	No random jitter (perfectly periodic)

LPC Analysis Parameters (To LPC autocorrelation)

Parameter	Value	Description
Prediction order	44	Number of LPC coefficients
Analysis width	0.025 s	Window length (25ms)
Time step	0.005 s	Hop size between analyses (5ms)
Pre-emphasis	50 Hz	High-pass frequency for analysis

LPC Filtering Parameters

Parameter	Value	Description
Inverse filter	no	Apply filter (not inverse)

Output Parameters

Parameter	Value	Description
Target intensity	70 dB	Output level normalization
Auto-play	yes	Play result immediately
Cleanup	yes	Remove intermediate objects

Modifying Parameters (Advanced)

To change behavior, edit script lines:

Change Pitch Range

To Pitch: 0, 75, 600
# Change 75 (floor) and 600 (ceiling)
# Example for low male: 50, 300
# Example for child: 150, 800

Change Smoothing

Smooth: 10
# Lower value = less smoothing (more natural jitter)
# Higher value = more smoothing (flatter pitch)
# Example for less smoothing: 5
# Example for more smoothing: 20

Change LPC Order

To LPC (autocorrelation): 44, 0.025, 0.005, 50
# First number = prediction order
# Higher = more formants tracked
# Example for simpler model: 24
# Example for detailed model: 64

Add Natural Flutter

To Sound (phonation): 44100, 1, 0.05, 0.7, 0.03, 3, 4, "no"
# Change "no" to "yes