Dynamic Vowel Transition Synthesizer — User Guide
Formant‑based speech synthesis: generates vowel‑like sounds through precise control of three formant frequencies (F1, F2, F3) and creates smooth transitions between different vowel qualities—from natural speech to extreme spectral morphing.
What this does
This script implements dynamic vowel transition synthesis — a formant‑based approach to creating vowel sounds and smoothly morphing between them. Rather than recording actual speech, it synthesizes vowel qualities by controlling three key resonant frequencies (formants F1, F2, F3) that define vowel identity. The script offers 15 preset transitions (from natural vowel progressions like A→I to extreme "alien" formant movements) and 6 spatial processing modes that distribute formants across the stereo field. Each vowel is created by mixing four sine waves: a fundamental pitch (around 120 Hz for male‑like speech) plus three formant‑tracking frequencies that linearly interpolate between start and end values over the sound's duration.
Key Features:
- 15 Vowel Transition Presets — Natural speech vowels (A, I, U), singing, whispering, robotic, alien, and spectral morphs
- Three‑Formant Synthesis — Controls F1 (first formant), F2 (second formant), F3 (third formant) for precise vowel quality
- 6 Spatial Processing Modes — Mono, stereo separation, rotating formants, binaural, wide field, and panning morph
- Real‑time Formant Interpolation — Smooth linear transitions between formant sets
- Built‑in Amplitude Shaping — Natural amplitude envelopes and subtle tremolo
- No External Samples — Pure additive synthesis via Praat's formula interpreter
- F1 (First formant): Related to mouth opening height — low F1 = closed vowel (like /i/ "ee"), high F1 = open vowel (like /a/ "ah")
- F2 (Second formant): Related to tongue position front‑back — high F2 = front vowel (/i/), low F2 = back vowel (/u/ "oo")
- F3 (Third formant): Adds nuance, important for /r/ sounds and vocal timbre
Technical Implementation: The script creates a four‑oscillator additive synthesis formula: (1) A fixed‑frequency oscillator for the fundamental (≈120 Hz). (2‑4) Three oscillators whose frequencies linearly interpolate from start to end formant values: freq(t) = start_f + (end_f - start_f) × (t/duration). These are mixed with different amplitudes (0.4, 0.6, 0.5, 0.3) approximating formant bandwidths. The result is multiplied by a gentle amplitude envelope (0.8 + 0.2×sin) and an exponential decay. For complex presets (All_vowels, Vowel_Cycle, Vocal_Journey), the formants themselves are modulated by low‑frequency oscillators, creating continuous vowel cycling rather than simple linear transitions.
Quick start
- In Praat, ensure no objects are selected.
- Run script… →
dynamic_vowel_transitions.praat. - Choose a Preset (A_to_I, Singing_Vowels, Alien_Vowels, etc.) or Custom.
- Select a Spatial_mode (Mono, Stereo_Voice, Rotating_Formants, etc.).
- Adjust Duration (typically 1‑5 seconds for transitions).
- Click OK — script synthesizes vowel transition, applies spatial processing, plays result.
- Output sound appears in Objects list with descriptive name.
Formant Synthesis Theory
The Source‑Filter Model
🎤 Speech Production Model
Two‑component model:
Formant Frequency Ranges
🗺️ Vowel Quadrant (F1‑F2 Space)
High F2 (Front) Low F2 (Back)
↑ ↑
F1 250 ┌─────────────────────┐ /i/ "ee" (270, 2290)
│ │
│ │
o 500 │ │
p │ │
e 750 │ │ /u/ "oo" (300, 870)
n │ │
│ │
1000 └─────────────────────┘ /a/ "ah" (730, 1090)
Front vowels Back vowels
Cardinal vowels in script (average male):
- /a/ "ah": F1=730 Hz, F2=1090 Hz, F3=2440 Hz
- /i/ "ee": F1=270 Hz, F2=2290 Hz, F3=3010 Hz
- /u/ "oo": F1=300 Hz, F2=870 Hz, F3=2240 Hz
Vowel identity determined primarily by F1/F2 relationship.
Synthesis Equation
Why Three Formants?
Complete Signal Flow
Preset Transitions
Natural Speech Vowels
Preset 2: A_to_I
🔤 "ah" → "ee" Transition
Formant movement:
- F1: 730 Hz → 270 Hz (decreases — mouth closes)
- F2: 1090 Hz → 2290 Hz (increases — tongue moves forward)
- F3: 2440 Hz → 3010 Hz (increases slightly)
Articulation: Open back vowel to close front vowel. Natural progression found in words like "price" or "time".
Sonic character: Classic vowel transition, recognizable as speech‑like. Good baseline for understanding formant synthesis.
Duration suggestion: 2‑3 seconds for clear transition.
Preset 3: I_to_U
🔤 "ee" → "oo" Transition
Formant movement:
- F1: 270 Hz → 300 Hz (slight increase)
- F2: 2290 Hz → 870 Hz (dramatic decrease — tongue retracts)
- F3: 3010 Hz → 2240 Hz (decreases — darker timbre)
Articulation: Close front vowel to close back vowel. Extreme tongue movement while keeping jaw relatively closed.
Sonic character: Bright → dark transition. Sounds like "we" → "who" without consonants.
Preset 4: U_to_A
🔤 "oo" → "ah" Transition
Formant movement:
- F1: 300 Hz → 730 Hz (increases — jaw opens)
- F2: 870 Hz → 1090 Hz (increases slightly)
- F3: 2240 Hz → 2440 Hz (increases slightly)
Articulation: Close back vowel to open back vowel. Jaw opens while tongue stays back.
Sonic character: Dark → bright transition completing the vowel triangle (A‑I‑U).
Specialized Vowel Types
Preset 5: All_vowels
🎭 Continuous Vowel Cycling
Implementation: Not linear transition — formants modulated by LFOs:
Effect: Formants independently sweep through vowel space, creating ever‑changing vowel quality that visits many points in F1‑F2 space.
Sonic character: "Talking without words" — continuous vowel morphing that never settles. Hypnotic, speech‑like but non‑linguistic.
Preset 6: Vowel_Cycle
🔄 Faster, Rhythmic Cycling
Implementation: Similar to All_vowels but faster LFO rates:
Effect: Faster vowel transitions with pronounced 2 Hz amplitude tremolo. More rhythmic, less ambient than All_vowels.
Sonic character: Pulsing, rhythmic vowel changes. Almost like vowel‑based "sequencer".
Preset 7: Formant_Glissando
🎢 Extreme Formant Sweeps
Formant movement:
- F1: 200 Hz → 1000 Hz (huge increase)
- F2: 800 Hz → 3000 Hz (extreme increase)
- F3: 2000 Hz → 4000 Hz (large increase)
Effect: All formants sweep upward dramatically, far beyond normal speech range.
Sonic character: Sci‑fi "beam" sounds, synthetic sirens, or extreme vocal effects. Not speech‑like.
Preset 8: Whisper_Transition
👂 Breathy, Aspirated Vowels
Formant movement:
- F1: 600 Hz → 400 Hz (moderate change)
- F2: 1200 Hz → 1800 Hz (fronting)
- F3: 2400 Hz → 2800 Hz (brightening)
Note: True whispering requires noise source and different spectral balance. This preset simulates whispered quality through formant choices.
Sonic character: Intimate, breathy vocal quality. Like whispered speech without consonants.
Preset 9: Singing_Vowels
🎶 Musical Vowel Transitions
Formant movement:
- F1: 550 Hz → 350 Hz
- F2: 1100 Hz → 2000 Hz
- F3: 2350 Hz → 3000 Hz
Characteristics: Formant values optimized for singing clarity (slightly different from speech). Smoother transitions, sustained tones.
Sonic character: Choral or operatic vowel transitions. Works well with longer durations (4‑6 s).
Preset 10: Robot_Speech
🤖 Mechanical, Regular Transitions
Formant movement:
- F1: 400 Hz → 500 Hz (small, precise change)
- F2: 1200 Hz → 1500 Hz (regular increase)
- F3: 2400 Hz → 2600 Hz (small increase)
Characteristics: All formants move in same direction with similar slopes. Unnatural but systematic.
Sonic character: Synthetic, robotic vocalization. Like text‑to‑speech without consonants.
Experimental & Extreme
Preset 11: Alien_Vowels
👽 Non‑Human Vocal Tract
Formant movement:
- F1: 150 Hz → 800 Hz (unusually low start)
- F2: 3000 Hz → 1200 Hz (starts extremely high)
- F3: 4000 Hz → 3500 Hz (very high throughout)
Characteristics: Formant relationships impossible for human vocal tract (F2 > F3 at start). Creates "alien" or "creature" vocalizations.
Sonic character: Sci‑fi creature sounds, non‑terrestrial life forms. Unsettling yet vocal‑like.
Preset 16: Vocal_Journey
🌌 Slow, Deep Morphing
Implementation: Complex internal modulation (not linear):
Effect: Extremely slow formant movement (cycles 8‑12 seconds). Deep, evolving vowel space exploration.
Sonic character: Meditative, slowly transforming vocal texture. Good for long‑duration ambient pieces.
All Presets Summary
| Preset | F1 Range | F2 Range | F3 Range | Character | Duration |
|---|---|---|---|---|---|
| A_to_I | 730→270 | 1090→2290 | 2440→3010 | Natural speech | 2‑3 s |
| I_to_U | 270→300 | 2290→870 | 3010→2240 | Bright→dark | 2‑3 s |
| U_to_A | 300→730 | 870→1090 | 2240→2440 | Dark→open | 2‑3 s |
| All_vowels | 0‑800* | 250‑1750* | 1250‑2750* | Continuous cycle | 5‑10 s |
| Vowel_Cycle | 0‑800* | 250‑1750* | 1250‑2750* | Rhythmic cycle | 3‑6 s |
| Formant_Glissando | 200→1000 | 800→3000 | 2000→4000 | Extreme sweeps | 3‑5 s |
| Whisper_Transition | 600→400 | 1200→1800 | 2400→2800 | Breathy | 2‑4 s |
| Singing_Vowels | 550→350 | 1100→2000 | 2350→3000 | Musical | 3‑6 s |
| Robot_Speech | 400→500 | 1200→1500 | 2400→2600 | Mechanical | 2‑3 s |
| Alien_Vowels | 150→800 | 3000→1200 | 4000→3500 | Non‑human | 3‑5 s |
| Choral_Shift | 500→600 | 1000→1400 | 2200→2600 | Choir‑like | 3‑5 s |
| Spectral_Morph | 300→700 | 900→1800 | 2100→2900 | Complex movement | 3‑5 s |
| Harmonic_Transition | 450→350 | 1300→1600 | 2500→2700 | Harmonic focus | 2‑4 s |
| Formant_Sweep | 250→850 | 700→2400 | 1800→3200 | Wide range | 4‑6 s |
| Vocal_Journey | 0‑900* | 0‑2600* | 1700‑2700* | Slow evolution | 10‑30 s |
* Modulated ranges, not linear transitions
Formant Structure & Implementation
Four‑Oscillator Architecture
🎛️ Oscillator Roles and Amplitudes
Oscillator 1: Fundamental (f₀)
- Frequency: Fixed 120 Hz (approximate male pitch)
- Amplitude: 0.4 (relative scale)
- Purpose: Provides pitch perception, harmonic basis
Oscillator 2: First Formant (F1)
- Frequency: Varies by preset (200‑1000 Hz range)
- Amplitude: 0.6 (strongest — F1 typically most prominent)
- Purpose: Determines vowel "openness"
Oscillator 3: Second Formant (F2)
- Frequency: Varies by preset (600‑3000 Hz range)
- Amplitude: 0.5
- Purpose: Determines vowel "frontness/backness"
Oscillator 4: Third Formant (F3)
- Frequency: Varies by preset (1800‑4000 Hz range)
- Amplitude: 0.3 (weakest — F3 contributes to timbre)
- Purpose: Adds vocal quality, brightness
Time‑Varying Frequencies
Amplitude Envelopes
Natural Speech vs. Synthesis Simplifications
- Formant bandwidths: Real formants have width (Q) — this uses pure sine waves
- Nasal zeros: Nasalized vowels have anti‑resonances
- Aspiration noise: Breathiness, especially in whispers
- Glottal source shape: Real vocal folds produce pulse train with specific spectrum
- Dynamic amplitudes: Formant amplitudes change with frequency
- Higher formants: F4 (≈3500 Hz), F5 (≈4500 Hz) affect voice quality
- Non‑linear interactions: Source‑filter coupling
Why these simplifications work: For vowel perception, F1‑F2 relationship is most critical. Pure sine waves at formant frequencies create clear spectral peaks. The simplifications make the synthesis tractable in Praat's formula language while preserving essential vowel qualities.
Spatial Processing Modes
Mono (Mode 1)
🔈 Single‑Channel Output
Processing: No stereo processing — keeps synthesized mono sound as is.
Effect: Centered, focused vocal sound. Good for further processing or when stereo imaging would be distracting.
Output name: vowel_transition_mono
Use case: Basic synthesis, further effects processing, mono‑compatible applications.
Stereo Voice (Mode 2)
🎧 Formant Frequency Separation
Processing:
- Left channel: Low‑pass filtered (0‑2000 Hz, Hann window, 100 Hz smoothing) + 0.9× gain
- Right channel: Band‑pass filtered (150‑4000 Hz, Hann window, 100 Hz smoothing) + 0.9× gain
Effect: Lower formants (F1, some F2) emphasized in left ear; higher formants (F2, F3) in right ear. Creates natural stereo width.
Output name: vowel_transition_stereo
Use case: Natural‑sounding vocal placement, standard stereo mixing.
Rotating Formants (Mode 3)
🌀 Circular Panning Motion
Processing: Sine/cosine panning at 0.15 Hz (6.67‑second rotation period):
Effect: Vowel sound appears to slowly rotate around listener's head. Creates immersive, 3D sensation.
Output name: vowel_transition_rotating
Use case: Meditative/ambient pieces, spatial audio experiments, VR/AR applications.
Binaural Vowels (Mode 4)
🧠 Differentiated Ear Processing
Processing:
- Left channel: Band‑pass 80‑3000 Hz + amplitude modulation (0.8 + 0.1*sin(2π*0.2*x))
- Right channel: Band‑pass 100‑3500 Hz + different AM (0.7 + 0.2*cos(2π*0.25*x))
Effect: Each ear receives differently filtered and modulated version, creating complex binaural interaction. Can produce phantom center images.
Output name: vowel_transition_binaural
Use case: Headphone listening, binaural audio research, intimate vocal effects.
Wide Transition (Mode 5)
🌐 Extreme Frequency Separation
Processing:
- Left channel: Very low‑pass (0‑1500 Hz, 120 Hz smoothing) + 0.8× gain
- Right channel: High‑pass/band‑pass (200‑5000 Hz, 120 Hz smoothing) + 0.8× gain
Effect: Bass frequencies strongly left, treble strongly right. Creates extreme stereo width but can sound unnatural.
Output name: vowel_transition_wide
Use case: Experimental music, special effects, when maximum stereo separation desired.
Panning Morph (Mode 6)
🏓 Panning Follows Formant Movement
Processing: Panning evolves linearly with time:
Effect: Sound moves from left to right as formants transition. Creates correlation between spectral change and spatial movement.
Output name: vowel_transition_panning
Use case: Sound design where spatial movement reinforces spectral transition, moving sound effects.
Sonic Applications
Speech Synthesis & Voice Design
🗣️ Building Block for Vocal Synthesis
Creating diphthongs: Chain multiple transitions (A→I then I→U) to create complex vowel sequences.
Voice character design: Adjust formant frequencies to create different "voices":
- Child voice: Multiply all formants by 1.3‑1.5, increase f₀ to 200‑300 Hz
- Female voice: Multiply formants by 1.15‑1.25, f₀=180‑220 Hz
- Male voice: Default settings (f₀=120 Hz)
- Elderly voice: Slightly lower F1, more prominent F3
Adding consonants: Layer with noise bursts (for fricatives), silence gaps (for plosives), nasal resonances.
Sound Design for Media
Sci‑fi interfaces: Robot_Speech preset with short duration (0.5‑1 s) for button presses, status alerts.
Alien creatures: Alien_Vowels with Vocal_Journey envelope, slowed down, layered with animal sounds.
Magical spells: Formant_Glissando with long duration, reversed, with reverb and pitch modulation.
Horror vocals: Whisper_Transition with extreme slowing (duration 10 s), low‑pass filtering, subtle distortion.
Music Production
Vocal pads: All_vowels or Vocal_Journey presets with long duration (10‑30 s), heavy reverb, slow filter sweeps.
Rhythmic vocal hits: A_to_I or U_to_A with short duration (0.2‑0.5 s), percussive envelope, side‑chain compression.
Choral textures: Singing_Vowels with multiple instances at different pitches (harmonized), Rotating_Formants spatial mode.
Glitch vocals: Vowel_Cycle preset chopped into 0.1 s segments, rearranged rhythmically, bit‑crushed.
Experimental & Educational
Formant perception demonstrations: Play A_to_I transition while displaying F1/F2 plot. Show how vowel perception changes continuously.
Vowel space exploration: Create grid of sounds covering F1‑F2 space to demonstrate vowel continuum.
Synthesized language: Create "words" by chaining vowel transitions with different durations and pitches.
Cross‑species communication: Use Alien_Vowels as basis for designing non‑human vocal communication systems.
Practical Workflow Examples
🎬 Film: "AI Companion Voice"
Character: Friendly artificial intelligence
Voice design:
- Base vowel set: Singing_Vowels preset (clear, musical)
- Spatial mode: Binaural_Vowels (intimate, headphone‑friendly)
- Pitch variation: Create multiple instances at different f₀ (120, 180, 240 Hz)
- Consonant simulation: Add filtered noise bursts before vowels
- Prosody: Vary duration (short for quick responses, long for explanations)
Result: Synthetic yet warm vocal quality suitable for AI character.
🎵 Track: "Vocal Ambient" (Music Production)
Structure:
- Layer 1 (Drone): Vocal_Journey, 60 s, Mono, heavily reverbed
- Layer 2 (Rhythm): Vowel_Cycle, 4 s, chopped to 1‑beat segments
- Layer 3 (Melody): A_to_I at different pitches (following chord progression)
- Layer 4 (Texture): Whisper_Transition, reversed, stereo‑widened
Processing: Global side‑chain compression, tape saturation, stereo imaging.
🔬 Research: "Vowel Continuum Perception"
Experiment: Create continuum between /a/ and /i/ with 10 equal steps
Procedure:
- Calculate intermediate formant values:
- Generate 10 sounds with formants: (730,1090), (679,1223), ..., (270,2290)
- Present to listeners in random order, ask to identify as /a/ or /i/
- Plot identification curve → find categorical boundary
Application: Speech perception research, categorical perception demonstration.
Advanced Techniques & Customization
Modifying Formant Values
For different voice types: Modify the start/end formant values in the script:
Non‑Linear Transitions
Replace linear interpolation with curves:
Adding More Formants
Extend to F4 and F5 for richer timbre:
Creating Custom Presets
Add new preset to script:
Combining with Other Praat Features
Pitch manipulation: Use Praat's "Manipulation" object to change f₀ contour after synthesis.
Formant tracking & modification: Synthesize vowel, extract formants with "To Formant (burg)", modify, resynthesize.
Layering with natural speech: Mix synthesized vowels with recorded speech for hybrid vocal effects.
Troubleshooting
Cause: Pure sine waves at formant frequencies create overly precise spectral peaks
Solution: Add slight frequency modulation to each oscillator, or use band‑passed noise instead of sine
Cause: Linear interpolation may be too regular
Solution: Use longer duration, or modify script for non‑linear (s‑curve) interpolation
Cause: Different filtering in left/right channels creates phase differences
Solution: Use Mono mode for mono compatibility, or adjust filter slopes to be more phase‑linear
Cause: Complex formula evaluated at every sample
Solution: Reduce duration, or synthesize in segments and concatenate