LPC Voice Morphing — User Guide

Linear Predictive Coding (LPC) vocoder: separates speech into excitation source (pitch/breath) and spectral envelope (vocal tract), then recombines with modified parameters to create robotic, whispered, or demonic voice transformations while preserving linguistic content.

Technique: Linear Predictive Coding (LPC) Implementation: Praat Script Category: Speech Processing/Vocoding Version: Pro Edition License: MIT License

Contents:

What this does Quick start LPC Theory Preset Characters Excitation Sources LPC Filter Analysis Parameters Sonic Applications

What this does

This script implements an LPC (Linear Predictive Coding) vocoder — a classic speech analysis‑synthesis technique that models the human vocal system. LPC decomposes speech into two components: (1) Excitation source (glottal pulses for voiced sounds, noise for unvoiced/whispered sounds), representing pitch and breath; and (2) Spectral envelope (vocal tract filter), representing formants and timbre. The script extracts these components from an input sound, allows modification of the excitation (e.g., making pitch monotone for robot voice, replacing with noise for whisper), then re‑filters the modified excitation through the original spectral envelope. The result preserves linguistic content (phonemes, words) while transforming vocal quality (pitch, breathiness, timbre).

Key Features:

4 Themed Presets — Natural Resynthesis, Robot Voice (Monotone), Whisper (True Noise), Deep Demon
Dual Excitation Sources — Pulse train (voiced) or Gaussian noise (unvoiced/whisper)
Auto‑calibrated LPC Order — Calculates optimal filter complexity based on sample rate
Pitch Manipulation — Force monotone pitch, shift pitch (Deep Demon), or preserve natural pitch
True Spectral Envelope Extraction — LPC analysis captures formant structure accurately
Post‑processing Optimization — Intensity scaling, whisper brightness correction
Non‑destructive Processing — Original sound preserved, output named with "_vocoded" suffix

What is LPC and why is it special for speech? Linear Predictive Coding is a mathematical method that predicts a sample's value as a linear combination of previous samples: s[n] ≈ a₁·s[n‑1] + a₂·s[n‑2] + ... + aₚ·s[n‑p]. For speech, this corresponds to modeling the vocal tract as an all‑pole filter (representing formants) driven by an excitation source (pitch pulses or noise). LPC's advantages for speech:

Source‑filter separation: Cleanly separates pitch/breath from vocal tract shape
Efficient representation: A few coefficients (typically 10‑16) represent the spectral envelope
High‑quality resynthesis: When re‑excited with proper source, sounds very natural
Modular manipulation: Can change source while keeping filter, or vice‑versa

LPC was used in early speech coding (cell phones, VOIP) and remains valuable for voice transformation.

Technical Implementation: The script follows the classic LPC vocoder pipeline: (1) Pitch analysis (Praat's Pitch object) or noise source creation. (2) Excitation generation — either pulse train via Praat's "To Sound (phonation)" with LF model parameters (7‑argument version) or Gaussian noise. (3) LPC analysis — Praat's "To LPC (autocorrelation)" extracts filter coefficients from original sound. (4) Filtering — Modified excitation is filtered through LPC filter (all‑pole synthesis). (5) Post‑processing — Intensity normalization, high‑frequency enhancement for whispers. The script handles the complex object conversions (Sound → Pitch → PitchTier → Pitch → PointProcess → Sound) required for proper excitation generation.

Quick start

In Praat, select exactly one Sound object containing speech.
Run script… → lpc_vocoder_pro.praat.
Choose a Preset (Natural, Robot, Whisper, Deep Demon) or Custom.
Adjust source parameters if needed (time_step, pitch range, monotone options).
Adjust LPC filter parameters if needed (order, window, pre‑emphasis).
Set target_intensity_db (default 70 dB SPL) and play_after_processing.
Click OK — script analyzes, generates excitation, filters, normalizes.
Output appears as: originalName_vocoded.
Watch Info window for progress (excitation generation, LPC analysis, vocoding).

Quick tip: Start with Natural Resynthesis preset on clear speech — should sound very similar to original (verifies LPC quality). Try Robot Voice on the same speech for classic monotone robot effect. Whisper converts voiced speech to breathy whisper while preserving words. Deep Demon adds dark, low‑pitched character. For best results, use clean speech recordings (minimal noise, room reverb). The script works best on monophonic speech (single speaker). Watch the Info window — it shows auto‑calibrated LPC order and processing steps. If you get error about "7 arguments", ensure you're using the latest Praat version.

Important: SPEECH‑SPECIFIC — LPC works best on human speech; music, environmental sounds, or noisy recordings may produce poor results. CLEAN INPUT NEEDED — Background noise, reverb, or multiple speakers will be encoded into LPC coefficients and may create artifacts. MONOPHONIC ONLY — Stereo sounds are processed as mono (Praat averages channels). PITCH DETECTION LIMITATIONS — Very low‑pitched voices (<75 Hz) or creaky voice may cause pitch detection errors. REAL‑TIME LIMITS — Long recordings (>60 s) may take time to process (LPC analysis is computationally intensive). WHISPER ARTIFACTS — Converting naturally voiced speech to whisper can sound artificial since real whispering involves different articulation, not just noise excitation.

LPC Theory & Source‑Filter Model

The Source‑Filter Model of Speech

🎤 Human Speech Production

Two‑component model:

Speech = SOURCE ⨂ FILTER SOURCE (Excitation): • Voiced sounds: Glottal pulse train (pitch = f₀) • Unvoiced sounds: Turbulent noise (fricatives, whispers) • Mixed: Both (e.g., breathy voice) FILTER (Vocal tract): • Resonances = formants (F1, F2, F3...) • Shaped by tongue, lips, jaw position • Determines vowel/consonant identity LPC models this as: s[n] = e[n] - Σ_{k=1}^{p} aₖ·s[n-k] where e[n] = excitation, {aₖ} = filter coefficients

Linear Predictive Coding Mathematics

🔢 LPC Equations

Prediction model:

Given speech samples s[1], s[2], ..., s[N] Predict current sample using p previous samples: ŝ[n] = a₁·s[n-1] + a₂·s[n-2] + ... + aₚ·s[n-p] Prediction error: e[n] = s[n] - ŝ[n] = s[n] - Σ_{k=1}^{p} aₖ·s[n-k] Goal: Find coefficients {aₖ} that minimize mean‑square error. Solution: Autocorrelation method → solve Yule‑Walker equations.

Frequency domain interpretation:

Transfer function of LPC filter (all‑pole): H(z) = G / (1 - Σ_{k=1}^{p} aₖ·z^{-k}) Where G = gain factor. The poles of H(z) correspond to formant frequencies.

LPC vs. Other Speech Analysis Methods

FFT‑based methods (spectrogram): • Full time‑frequency representation • Shows harmonics + formants together • Hard to separate source from filter • Resynthesis requires phase reconstruction Formant tracking: • Extracts only formant frequencies/bandwidths • Loses spectral detail • Good for vowel analysis, poor for consonants • Resynthesis requires formant synthesizer LPC (this script): • Explicit source‑filter separation • Compact representation (p+1 coefficients/frame) • Natural‑sounding resynthesis • Easy to modify source or filter independently • Particularly good for voiced sounds Limitations of LPC: • Assumes minimum‑phase vocal tract (not always true for nasals) • Struggles with non‑speech sounds • All‑pole model can't represent spectral zeros (anti‑resonances)

Complete LPC Vocoder Pipeline

🔄 Processing Steps

INPUT: Original speech sound

STEP 1: PITCH ANALYSIS (for voiced excitation)
  • To Pitch (Praat): time_step, min_pitch, max_pitch
  • Optionally modify: force_monotone, pitch_shift
  • Convert to excitation: Pitch → PitchTier → Pitch → PointProcess → Sound

STEP 2: NOISE GENERATION (for whisper)
  • Create Gaussian noise: randomGauss(0, 0.2)
  • Same duration as original

STEP 3: LPC ANALYSIS (spectral envelope)
  • To LPC (autocorrelation): order, window, pre‑emphasis
  • Computes coefficients {aₖ} and gain G per frame
  • Captures formant structure

STEP 4: FILTERING (synthesis)
  • Filter source through LPC filter
  • Praat's "Filter" command with LPC object
  • Result: source shaped by original's spectral envelope

STEP 5: POST‑PROCESSING
  • Scale intensity to target_db
  • Whisper: high‑frequency emphasis
  • Output naming

OUTPUT: originalName_vocoded

Preset Characters

Preset 1: Custom

🎛️ Full Manual Control

Parameters: All parameters remain at their form values.

Source type: Determined by force_monotone and other settings (defaults to voiced pulse train).

Use case: When you want to experiment with specific parameter combinations or create custom transformations beyond the presets.

Output name: originalName_vocoded

Preset 2: Natural Resynthesis

🗣️ High‑Quality Speech Reconstruction

Settings:

LPC order: Auto‑calibrated (sr/1000 + 4)
Analysis window: 0.025 s (25 ms) — standard for speech
Force monotone: No (preserves natural pitch contour)
Source type: Pulse train (voiced)

Mechanism: Extracts natural pitch contour, generates corresponding pulse train, filters through LPC envelope. Should sound very similar to original (minor differences due to LPC modeling limitations).

Sonic character: Clean, slightly "processed" but natural‑sounding speech. Useful for verifying LPC quality or as preprocessing step.

Best for: Testing LPC quality, speech enhancement (if original is noisy), voice compression experiments.

Preset 3: Robot Voice (Monotone)

🤖 Classic Synthetic Speech

Settings:

LPC order: Auto‑calibrated
Analysis window: 0.030 s (30 ms) — slightly longer for stability
Force monotone: Yes
Monotone frequency: 100 Hz
Source type: Pulse train

Mechanism: Replaces natural pitch contour with fixed 100 Hz pulse train. All prosody (pitch variation) removed, but formant structure (vowel/consonant identity) preserved.

Sonic character: Classic "robot" or "text‑to‑speech" voice. Flat, synthetic, but intelligible. Words are clear but delivery is mechanical.

Best for: Sci‑fi effects, synthetic character voices, demonstrating source‑filter separation.

Preset 4: Whisper (True Noise)

👂 Breath‑Only Speech

Settings:

LPC order: Auto‑calibrated
Analysis window: 0.015 s (15 ms) — shorter for better consonant capture
Source type: Gaussian noise (unvoiced)
Post‑processing: High‑frequency emphasis

Mechanism: Replaces glottal pulses with Gaussian noise while preserving LPC spectral envelope. The noise is shaped by the original formant structure, creating whispered version.

Sonic character: Breathy, whispered speech. Lacks vocal fold vibration but retains vowel/consonant distinctions. Post‑processing adds high‑frequency brightness for intelligibility.

Best for: Intimate vocal effects, privacy masking (speech content preserved but voice identity obscured), ghost/spectral voices.

Preset 5: Deep Demon

😈 Low‑Pitched, Dark Character

Settings:

LPC order: Auto‑calibrated
Analysis window: 0.040 s (40 ms) — long for stability
Minimum pitch: 50 Hz (very low)
Force monotone: No (but pitch contour shifted down)
Pitch shift: Multiply by 0.6 (60% of original)

Mechanism: Lowers pitch contour by 40% while preserving relative pitch variations (not monotone). Uses longer analysis window for stable low‑frequency analysis.

Sonic character: Dark, low‑pitched, imposing. Sounds like giant, demon, or ancient being. Pitch is lower but retains some natural prosody (not completely flat).

Best for: Character voice design (monsters, giants, demons), horror sound design, low‑pitch effects.

Preset Comparison Table

Preset	Source	Pitch	Window	Post‑Process	Character
Natural	Pulse train	Original contour	25 ms	Intensity only	Clean speech
Robot	Pulse train	100 Hz fixed	30 ms	Intensity only	Synthetic, flat
Whisper	Gaussian noise	N/A (noise)	15 ms	HF emphasis	Breathy, intimate
Deep Demon	Pulse train	60% of original	40 ms	Intensity only	Dark, low‑pitched

Excitation Source Generation

Voiced Source: LF Model Pulse Train

❤️ Glottal Pulse Model (LF Parameters)

The LF (Liljencrants‑Fant) model is a mathematical model of glottal airflow that produces natural‑sounding pulse trains. Praat's implementation uses 7 parameters:

To Sound (phonation): samplingFrequency, adaptationFactor (1.0), maximumPeriod (0.05), openPhase (0.7), collisionPhase (0.03), power1 (3.0), power2 (4.0) Where: • openPhase = portion of cycle glottis is open (0‑1) • collisionPhase = portion where vocal folds collide (creates sharp closure) • power1, power2 = shape parameters for opening/closing phases

Why LF model matters: Simple impulse trains sound buzzy/artificial. LF model creates natural glottal waveform with correct spectral tilt (‑12 dB/octave).

Pitch Analysis & Manipulation Chain

📈 From Sound to Controlled Excitation

Step‑by‑step conversion:

Sound → Pitch: Praat's pitch detection algorithm (autocorrelation).
Pitch → PitchTier: Convert to manipulatable pitch contour.
Modification: Apply Formula to PitchTier (e.g., multiply by 0.6 for Deep Demon).
PitchTier → Pitch: Convert back to Pitch object (required for next step).
Pitch → PointProcess: Create impulse at each glottal closure instant.
PointProcess → Sound (phonation): Generate LF model pulse train from impulses.

Why this complexity? Praat requires specific object types for each operation. The chain allows pitch manipulation at the PitchTier stage while ensuring proper timing for pulse generation.

Unvoiced Source: Gaussian Noise

# Whisper excitation creation: Create Sound from formula: "excitation_noise", 1, 0, duration, sr, "randomGauss(0, 0.2)" Why Gaussian noise (not uniform)? • Gaussian distribution sounds more natural (like turbulent airflow) • Amplitude distribution matches real breath noise statistics • 0.2 standard deviation provides appropriate level (can be adjusted) Why not shaped noise? • LPC filter will shape the noise appropriately • Starting with flat spectrum allows full control via LPC coefficients

Source Selection Logic

Preset‑based selection: • Natural, Robot, Deep Demon: source_type = 1 (pulse train) • Whisper: source_type = 2 (Gaussian noise) Custom mode: Always uses pulse train unless user manually modifies script. Why not mixed excitation? Real speech has both voiced and unvoiced segments. This script uses homogeneous excitation throughout for dramatic effect. For natural‑sounding synthesis, you'd need voicing‑decision logic to switch between pulse and noise per frame.

LPC Filter Analysis

LPC Order Determination

🔢 Auto‑calibration Formula

When lpc_order = 0 in form:

lpc_order = round(sampling_rate / 1000) + 4 Example: • 44100 Hz → 44 + 4 = 48 poles • 22050 Hz → 22 + 4 = 26 poles • 16000 Hz → 16 + 4 = 20 poles Rationale: • Need ~1 pole per kHz to represent formants • +2 for overall spectral shape • +2 for safety margin • Total: Fs/1000 + 4 Manual override: Set lpc_order > 0 to use specific value.

Typical values for speech:

8‑12 poles: Telephone bandwidth (300‑3400 Hz)
10‑16 poles: Standard speech (50‑8000 Hz)
16‑20 poles: High‑quality speech (full bandwidth)
20‑30 poles: Music or very high‑quality

Praat's LPC Analysis Parameters

# LPC analysis call: To LPC (autocorrelation): lpc_order, analysis_window, 0.005, pre_emphasis_hz Parameters: 1. lpc_order: Number of poles (complex conjugate pairs) 2. analysis_window: Window length in seconds (typically 0.025) 3. time_step: Analysis frame step (0.005 = 5 ms, half‑overlapping) 4. pre_emphasis_hz: Pre‑emphasis frequency (50 Hz = mild high‑pass) Pre‑emphasis: Applies 6 dB/octave high‑pass filter above pre_emphasis_hz. Compensates for natural ‑12 dB/octave glottal source roll‑off. Result: flatter spectrum → better LPC modeling. Window choice: Praat uses Gaussian‑like window (not Hann or Hamming). Appropriate for speech, reduces edge effects.

Spectral Envelope Extraction

What LPC captures: • Formant frequencies (spectral peaks) • Formant bandwidths (peak widths) • Overall spectral tilt • Nasal zeros (approximately, via pole positions) What LPC misses: • Fine harmonic structure (handled by excitation) • Phase information (LPC is magnitude‑oriented) • Exact spectral details between formants Frame‑based processing: • Speech divided into short frames (analysis_window) • Each frame: compute LPC coefficients • Coefficients interpolated between frames • Creates smooth evolving filter Filtering during synthesis: • Excitation filtered by time‑varying LPC filter • Filter coefficients change smoothly between frames • Creates natural‑sounding transitions

Post‑processing: Whisper Brightness Correction

# Applied only for Whisper preset: Formula: "self + 0.5 * (self - self[col-1])" This is a simple high‑pass difference filter: y[n] = x[n] + 0.5*(x[n] - x[n-1]) = 1.5*x[n] - 0.5*x[n-1] Frequency response: • DC (0 Hz): gain = 1.0 (unchanged) • Nyquist: gain ≈ 2.0 (+6 dB boost) • Effect: Gentle high‑frequency emphasis Why needed for whispers? • Real whispering involves more high‑frequency energy • Noise excitation + LPC may sound dull • This adds "air" and intelligibility

Parameters & Controls

Form Parameters

🎛️ User‑Adjustable Settings

Parameter	Type	Default	Range	Description
preset	optionmenu	Custom	1‑5	Preset configuration
time_step	positive	0.01	0.001‑0.05	Pitch analysis time step (seconds)
minimum_pitch	positive	75	50‑200	Lowest expected pitch (Hz)
maximum_pitch	positive	600	200‑1000	Highest expected pitch (Hz)
force_monotone	boolean	0	0/1	Force constant pitch (robot voice)
monotone_frequency	positive	120	50‑300	Fixed pitch when monotone (Hz)
lpc_order	integer	0	0‑50	LPC filter order (0=auto)
analysis_window	positive	0.025	0.01‑0.05	LPC analysis window length (seconds)
pre_emphasis_hz	positive	50	0‑200	Pre‑emphasis frequency (Hz)
target_intensity_db	positive	70	50‑90	Output intensity (dB SPL)
play_after_processing	boolean	1	0/1	Auto‑play result

Parameter Guidelines

time_step (pitch analysis): • 0.01 s (10 ms): Standard for speech • Smaller (0.005): More detailed pitch tracking (slower) • Larger (0.02): Smoother, faster minimum_pitch / maximum_pitch: • Male speech: 75‑300 Hz • Female speech: 150‑600 Hz • Child speech: 200‑800 Hz • Set wider than expected range to avoid missed pitches analysis_window (LPC): • 0.025 s (25 ms): Standard — balances time/frequency resolution • Shorter (0.015): Better for quick consonants (plosives) • Longer (0.040): Better for stable vowels, low pitches pre_emphasis_hz: • 50 Hz: Mild, for general speech • 0: No pre‑emphasis (may sound bass‑heavy) • 100‑200: Strong high‑pass (brightens)

Internal Variables & Objects

Variable	Source	Description
source_type	Preset or default=1	1=pulse train, 2=noise
sr, dur	Get sampling frequency/duration	Original sound properties
pitchID, ptID, modPitchID, ppID	Intermediate objects	Pitch analysis chain (cleaned up)
sourceID	Created excitation	"excitation_pulse" or "excitation_noise"
lpcID	To LPC (autocorrelation)	LPC filter object
vocodedID	Filter output	Final processed sound

Sonic Applications

Voice Design for Media

🎬 Character Voice Creation

Robot/AI characters: Robot preset (monotone 100 Hz) or experiment with different monotone_frequencies (80 Hz for larger robots, 140 Hz for smaller).

Monsters/demons: Deep Demon preset for low, imposing voices. Combine with other effects (distortion, reverb).

Ghosts/spectres: Whisper preset for breathy, ethereal voices. Add reverb with long decay.

Alien languages: Process speech, then reverse or further process with other effects. LPC preserves phoneme structure while transforming voice quality.

Music Production

Vocal processing:

Robot backing vocals: Apply Robot preset to harmony parts
Whispered doubles: Create whisper version to layer under main vocal
Pitch‑corrected effects: Use Natural preset but with modified pitch contour (custom PitchTier formula)
Vocal synthesis from instruments: Process instrumental sounds through LPC trained on speech (experimental)

Speech Technology & Education

Speech analysis demonstrations: Show source‑filter separation by listening to excitation and filtered result separately.

Accent/voice modification: Process speech while preserving linguistic content but altering voice characteristics.

Privacy masking: Whisper preset obscures speaker identity while preserving words (for interviews, sensitive content).

Speech compression understanding: Demonstrate how LPC achieves high compression (few coefficients represent speech).

Practical Workflow Examples

🎬 Film: "Synthetic News Anchor"

Scene: Futuristic news broadcast with AI anchor

Processing chain:

Record: Human voice actor reading news copy
LPC Robot preset: monotone_frequency=110 Hz (authoritative but synthetic)
Post‑processing: Add subtle reverb (newsroom ambiance), light compression
Layering: Mix 70% processed with 30% original for slight human warmth
Automation: Occasionally switch to Natural preset for "emotional" moments

Result: Believable synthetic news anchor that can toggle between mechanical and emotional delivery.

🎵 Song: "Ghost Choir" Arrangement

Concept: Choral piece with layered natural and processed voices

Voice layers:

Layer 1 (Main melody): Natural vocal
Layer 2 (Harmony): Same performance → Whisper preset
Layer 3 (Bass): Same performance → Deep Demon (pitch lowered)
Layer 4 (Ethereal): Same performance → Robot preset (monotone) + heavy reverb

Mix: Pan layers, adjust levels, add global reverb for cohesive space.

Result: Rich, spectral choir from single vocal performance.

Advanced Techniques & Customization

Custom Pitch Manipulation

Modify PitchTier formula for custom effects:

# In the script, after creating PitchTier (ptID): selectObject: ptID # Example: Sinusoidal pitch wobble Formula: "self * (1 + 0.2*sin(2*pi*2*x))" # ±20% wobble at 2 Hz # Example: Pitch ramp up Formula: "self * (1 + 0.5*x/duration)" # Rises 50% over duration # Example: Random pitch variations Formula: "self * (1 + 0.3*randomUniform(-1,1))" # ±30% random # Example: Create octave jumps Formula: "if x mod 0.5 < 0.25 then self else self*2 fi" # Jumps octave every 0.5 s

Mixed Excitation (Voiced/Unvoiced)

Create more natural synthesis with voicing decisions:

# Pseudocode for frame‑based voicing decision: # (Would require significant script modification) for each analysis frame: voicing = Get voicing probability from Pitch object if voicing > 0.7 # Use pulse train for this frame else # Use noise for this frame endif # Would need to create alternating source segments # and ensure smooth transitions

LPC Coefficient Manipulation

Modify formant structure directly: After LPC analysis, you could modify coefficients before filtering:

# Extract LPC coefficients selectObject: lpcID # Coefficients accessible via matrix queries # Could shift formant frequencies, change bandwidths, etc. # Requires advanced Praat scripting

Cross‑Synthesis (Voice Transfer)

Use one sound's excitation with another's LPC filter:

# Process two sounds: sound1 = Get excitation from (e.g., musical instrument) sound2 = Get LPC from (e.g., speech) # Filter sound1's excitation through sound2's LPC filter # Creates hybrid: instrument playing "speech" or vice‑versa # Implementation: # 1. Extract excitation from sound1 (pitch analysis or noise) # 2. Extract LPC from sound2 # 3. Filter excitation through LPC

Troubleshooting

Problem: Output is buzzy/artificial
Causes: LPC order too low, poor pitch detection, source‑filter mismatch
Solutions: Increase lpc_order, check pitch range settings, try Natural preset first

Problem: Output has clicks/pops
Causes: Frame‑to‑frame discontinuities in LPC coefficients or excitation
Solutions: Increase analysis_window, ensure smooth pitch transitions

Problem: Whisper sounds muffled/unintelligible
Causes: Insufficient high‑frequency content, LPC window too long
Solutions: Use shorter analysis_window (0.015), increase pre‑emphasis_hz

Problem: Error about "7 arguments"
Causes: Older Praat version or script error
Solutions: Update to latest Praat, ensure script has correct 7‑parameter call