LPC Voice Morphing — User Guide
Linear Predictive Coding (LPC) vocoder: separates speech into excitation source (pitch/breath) and spectral envelope (vocal tract), then recombines with modified parameters to create robotic, whispered, or demonic voice transformations while preserving linguistic content.
What this does
This script implements an LPC (Linear Predictive Coding) vocoder — a classic speech analysis‑synthesis technique that models the human vocal system. LPC decomposes speech into two components: (1) Excitation source (glottal pulses for voiced sounds, noise for unvoiced/whispered sounds), representing pitch and breath; and (2) Spectral envelope (vocal tract filter), representing formants and timbre. The script extracts these components from an input sound, allows modification of the excitation (e.g., making pitch monotone for robot voice, replacing with noise for whisper), then re‑filters the modified excitation through the original spectral envelope. The result preserves linguistic content (phonemes, words) while transforming vocal quality (pitch, breathiness, timbre).
Key Features:
- 4 Themed Presets — Natural Resynthesis, Robot Voice (Monotone), Whisper (True Noise), Deep Demon
- Dual Excitation Sources — Pulse train (voiced) or Gaussian noise (unvoiced/whisper)
- Auto‑calibrated LPC Order — Calculates optimal filter complexity based on sample rate
- Pitch Manipulation — Force monotone pitch, shift pitch (Deep Demon), or preserve natural pitch
- True Spectral Envelope Extraction — LPC analysis captures formant structure accurately
- Post‑processing Optimization — Intensity scaling, whisper brightness correction
- Non‑destructive Processing — Original sound preserved, output named with "_vocoded" suffix
s[n] ≈ a₁·s[n‑1] + a₂·s[n‑2] + ... + aₚ·s[n‑p]. For speech, this corresponds to modeling the vocal tract as an all‑pole filter (representing formants) driven by an excitation source (pitch pulses or noise). LPC's advantages for speech:
- Source‑filter separation: Cleanly separates pitch/breath from vocal tract shape
- Efficient representation: A few coefficients (typically 10‑16) represent the spectral envelope
- High‑quality resynthesis: When re‑excited with proper source, sounds very natural
- Modular manipulation: Can change source while keeping filter, or vice‑versa
Technical Implementation: The script follows the classic LPC vocoder pipeline: (1) Pitch analysis (Praat's Pitch object) or noise source creation. (2) Excitation generation — either pulse train via Praat's "To Sound (phonation)" with LF model parameters (7‑argument version) or Gaussian noise. (3) LPC analysis — Praat's "To LPC (autocorrelation)" extracts filter coefficients from original sound. (4) Filtering — Modified excitation is filtered through LPC filter (all‑pole synthesis). (5) Post‑processing — Intensity normalization, high‑frequency enhancement for whispers. The script handles the complex object conversions (Sound → Pitch → PitchTier → Pitch → PointProcess → Sound) required for proper excitation generation.
Quick start
- In Praat, select exactly one Sound object containing speech.
- Run script… →
lpc_vocoder_pro.praat. - Choose a Preset (Natural, Robot, Whisper, Deep Demon) or Custom.
- Adjust source parameters if needed (time_step, pitch range, monotone options).
- Adjust LPC filter parameters if needed (order, window, pre‑emphasis).
- Set target_intensity_db (default 70 dB SPL) and play_after_processing.
- Click OK — script analyzes, generates excitation, filters, normalizes.
- Output appears as:
originalName_vocoded. - Watch Info window for progress (excitation generation, LPC analysis, vocoding).
LPC Theory & Source‑Filter Model
The Source‑Filter Model of Speech
🎤 Human Speech Production
Two‑component model:
Linear Predictive Coding Mathematics
🔢 LPC Equations
Prediction model:
Frequency domain interpretation:
LPC vs. Other Speech Analysis Methods
Complete LPC Vocoder Pipeline
🔄 Processing Steps
INPUT: Original speech sound
STEP 1: PITCH ANALYSIS (for voiced excitation)
• To Pitch (Praat): time_step, min_pitch, max_pitch
• Optionally modify: force_monotone, pitch_shift
• Convert to excitation: Pitch → PitchTier → Pitch → PointProcess → Sound
STEP 2: NOISE GENERATION (for whisper)
• Create Gaussian noise: randomGauss(0, 0.2)
• Same duration as original
STEP 3: LPC ANALYSIS (spectral envelope)
• To LPC (autocorrelation): order, window, pre‑emphasis
• Computes coefficients {aₖ} and gain G per frame
• Captures formant structure
STEP 4: FILTERING (synthesis)
• Filter source through LPC filter
• Praat's "Filter" command with LPC object
• Result: source shaped by original's spectral envelope
STEP 5: POST‑PROCESSING
• Scale intensity to target_db
• Whisper: high‑frequency emphasis
• Output naming
OUTPUT: originalName_vocoded
Preset Characters
Preset 1: Custom
🎛️ Full Manual Control
Parameters: All parameters remain at their form values.
Source type: Determined by force_monotone and other settings (defaults to voiced pulse train).
Use case: When you want to experiment with specific parameter combinations or create custom transformations beyond the presets.
Output name: originalName_vocoded
Preset 2: Natural Resynthesis
🗣️ High‑Quality Speech Reconstruction
Settings:
- LPC order: Auto‑calibrated (sr/1000 + 4)
- Analysis window: 0.025 s (25 ms) — standard for speech
- Force monotone: No (preserves natural pitch contour)
- Source type: Pulse train (voiced)
Mechanism: Extracts natural pitch contour, generates corresponding pulse train, filters through LPC envelope. Should sound very similar to original (minor differences due to LPC modeling limitations).
Sonic character: Clean, slightly "processed" but natural‑sounding speech. Useful for verifying LPC quality or as preprocessing step.
Best for: Testing LPC quality, speech enhancement (if original is noisy), voice compression experiments.
Preset 3: Robot Voice (Monotone)
🤖 Classic Synthetic Speech
Settings:
- LPC order: Auto‑calibrated
- Analysis window: 0.030 s (30 ms) — slightly longer for stability
- Force monotone: Yes
- Monotone frequency: 100 Hz
- Source type: Pulse train
Mechanism: Replaces natural pitch contour with fixed 100 Hz pulse train. All prosody (pitch variation) removed, but formant structure (vowel/consonant identity) preserved.
Sonic character: Classic "robot" or "text‑to‑speech" voice. Flat, synthetic, but intelligible. Words are clear but delivery is mechanical.
Best for: Sci‑fi effects, synthetic character voices, demonstrating source‑filter separation.
Preset 4: Whisper (True Noise)
👂 Breath‑Only Speech
Settings:
- LPC order: Auto‑calibrated
- Analysis window: 0.015 s (15 ms) — shorter for better consonant capture
- Source type: Gaussian noise (unvoiced)
- Post‑processing: High‑frequency emphasis
Mechanism: Replaces glottal pulses with Gaussian noise while preserving LPC spectral envelope. The noise is shaped by the original formant structure, creating whispered version.
Sonic character: Breathy, whispered speech. Lacks vocal fold vibration but retains vowel/consonant distinctions. Post‑processing adds high‑frequency brightness for intelligibility.
Best for: Intimate vocal effects, privacy masking (speech content preserved but voice identity obscured), ghost/spectral voices.
Preset 5: Deep Demon
😈 Low‑Pitched, Dark Character
Settings:
- LPC order: Auto‑calibrated
- Analysis window: 0.040 s (40 ms) — long for stability
- Minimum pitch: 50 Hz (very low)
- Force monotone: No (but pitch contour shifted down)
- Pitch shift: Multiply by 0.6 (60% of original)
Mechanism: Lowers pitch contour by 40% while preserving relative pitch variations (not monotone). Uses longer analysis window for stable low‑frequency analysis.
Sonic character: Dark, low‑pitched, imposing. Sounds like giant, demon, or ancient being. Pitch is lower but retains some natural prosody (not completely flat).
Best for: Character voice design (monsters, giants, demons), horror sound design, low‑pitch effects.
Preset Comparison Table
| Preset | Source | Pitch | Window | Post‑Process | Character |
|---|---|---|---|---|---|
| Natural | Pulse train | Original contour | 25 ms | Intensity only | Clean speech |
| Robot | Pulse train | 100 Hz fixed | 30 ms | Intensity only | Synthetic, flat |
| Whisper | Gaussian noise | N/A (noise) | 15 ms | HF emphasis | Breathy, intimate |
| Deep Demon | Pulse train | 60% of original | 40 ms | Intensity only | Dark, low‑pitched |
Excitation Source Generation
Voiced Source: LF Model Pulse Train
❤️ Glottal Pulse Model (LF Parameters)
The LF (Liljencrants‑Fant) model is a mathematical model of glottal airflow that produces natural‑sounding pulse trains. Praat's implementation uses 7 parameters:
Why LF model matters: Simple impulse trains sound buzzy/artificial. LF model creates natural glottal waveform with correct spectral tilt (‑12 dB/octave).
Pitch Analysis & Manipulation Chain
📈 From Sound to Controlled Excitation
Step‑by‑step conversion:
- Sound → Pitch: Praat's pitch detection algorithm (autocorrelation).
- Pitch → PitchTier: Convert to manipulatable pitch contour.
- Modification: Apply Formula to PitchTier (e.g., multiply by 0.6 for Deep Demon).
- PitchTier → Pitch: Convert back to Pitch object (required for next step).
- Pitch → PointProcess: Create impulse at each glottal closure instant.
- PointProcess → Sound (phonation): Generate LF model pulse train from impulses.
Why this complexity? Praat requires specific object types for each operation. The chain allows pitch manipulation at the PitchTier stage while ensuring proper timing for pulse generation.
Unvoiced Source: Gaussian Noise
Source Selection Logic
LPC Filter Analysis
LPC Order Determination
🔢 Auto‑calibration Formula
When lpc_order = 0 in form:
Typical values for speech:
- 8‑12 poles: Telephone bandwidth (300‑3400 Hz)
- 10‑16 poles: Standard speech (50‑8000 Hz)
- 16‑20 poles: High‑quality speech (full bandwidth)
- 20‑30 poles: Music or very high‑quality
Praat's LPC Analysis Parameters
Spectral Envelope Extraction
Post‑processing: Whisper Brightness Correction
Parameters & Controls
Form Parameters
🎛️ User‑Adjustable Settings
| Parameter | Type | Default | Range | Description |
|---|---|---|---|---|
| preset | optionmenu | Custom | 1‑5 | Preset configuration |
| time_step | positive | 0.01 | 0.001‑0.05 | Pitch analysis time step (seconds) |
| minimum_pitch | positive | 75 | 50‑200 | Lowest expected pitch (Hz) |
| maximum_pitch | positive | 600 | 200‑1000 | Highest expected pitch (Hz) |
| force_monotone | boolean | 0 | 0/1 | Force constant pitch (robot voice) |
| monotone_frequency | positive | 120 | 50‑300 | Fixed pitch when monotone (Hz) |
| lpc_order | integer | 0 | 0‑50 | LPC filter order (0=auto) |
| analysis_window | positive | 0.025 | 0.01‑0.05 | LPC analysis window length (seconds) |
| pre_emphasis_hz | positive | 50 | 0‑200 | Pre‑emphasis frequency (Hz) |
| target_intensity_db | positive | 70 | 50‑90 | Output intensity (dB SPL) |
| play_after_processing | boolean | 1 | 0/1 | Auto‑play result |
Parameter Guidelines
Internal Variables & Objects
| Variable | Source | Description |
|---|---|---|
| source_type | Preset or default=1 | 1=pulse train, 2=noise |
| sr, dur | Get sampling frequency/duration | Original sound properties |
| pitchID, ptID, modPitchID, ppID | Intermediate objects | Pitch analysis chain (cleaned up) |
| sourceID | Created excitation | "excitation_pulse" or "excitation_noise" |
| lpcID | To LPC (autocorrelation) | LPC filter object |
| vocodedID | Filter output | Final processed sound |
Sonic Applications
Voice Design for Media
🎬 Character Voice Creation
Robot/AI characters: Robot preset (monotone 100 Hz) or experiment with different monotone_frequencies (80 Hz for larger robots, 140 Hz for smaller).
Monsters/demons: Deep Demon preset for low, imposing voices. Combine with other effects (distortion, reverb).
Ghosts/spectres: Whisper preset for breathy, ethereal voices. Add reverb with long decay.
Alien languages: Process speech, then reverse or further process with other effects. LPC preserves phoneme structure while transforming voice quality.
Music Production
Vocal processing:
- Robot backing vocals: Apply Robot preset to harmony parts
- Whispered doubles: Create whisper version to layer under main vocal
- Pitch‑corrected effects: Use Natural preset but with modified pitch contour (custom PitchTier formula)
- Vocal synthesis from instruments: Process instrumental sounds through LPC trained on speech (experimental)
Speech Technology & Education
Speech analysis demonstrations: Show source‑filter separation by listening to excitation and filtered result separately.
Accent/voice modification: Process speech while preserving linguistic content but altering voice characteristics.
Privacy masking: Whisper preset obscures speaker identity while preserving words (for interviews, sensitive content).
Speech compression understanding: Demonstrate how LPC achieves high compression (few coefficients represent speech).
Practical Workflow Examples
🎬 Film: "Synthetic News Anchor"
Scene: Futuristic news broadcast with AI anchor
Processing chain:
- Record: Human voice actor reading news copy
- LPC Robot preset: monotone_frequency=110 Hz (authoritative but synthetic)
- Post‑processing: Add subtle reverb (newsroom ambiance), light compression
- Layering: Mix 70% processed with 30% original for slight human warmth
- Automation: Occasionally switch to Natural preset for "emotional" moments
Result: Believable synthetic news anchor that can toggle between mechanical and emotional delivery.
🎵 Song: "Ghost Choir" Arrangement
Concept: Choral piece with layered natural and processed voices
Voice layers:
- Layer 1 (Main melody): Natural vocal
- Layer 2 (Harmony): Same performance → Whisper preset
- Layer 3 (Bass): Same performance → Deep Demon (pitch lowered)
- Layer 4 (Ethereal): Same performance → Robot preset (monotone) + heavy reverb
Mix: Pan layers, adjust levels, add global reverb for cohesive space.
Result: Rich, spectral choir from single vocal performance.
Advanced Techniques & Customization
Custom Pitch Manipulation
Modify PitchTier formula for custom effects:
Mixed Excitation (Voiced/Unvoiced)
Create more natural synthesis with voicing decisions:
LPC Coefficient Manipulation
Modify formant structure directly: After LPC analysis, you could modify coefficients before filtering:
Cross‑Synthesis (Voice Transfer)
Use one sound's excitation with another's LPC filter:
Troubleshooting
Causes: LPC order too low, poor pitch detection, source‑filter mismatch
Solutions: Increase lpc_order, check pitch range settings, try Natural preset first
Causes: Frame‑to‑frame discontinuities in LPC coefficients or excitation
Solutions: Increase analysis_window, ensure smooth pitch transitions
Causes: Insufficient high‑frequency content, LPC window too long
Solutions: Use shorter analysis_window (0.015), increase pre‑emphasis_hz
Causes: Older Praat version or script error
Solutions: Update to latest Praat, ensure script has correct 7‑parameter call