Dynamic Vowel Transition Synthesizer — User Guide

Formant‑based speech synthesis: generates vowel‑like sounds through precise control of three formant frequencies (F1, F2, F3) and creates smooth transitions between different vowel qualities—from natural speech to extreme spectral morphing.

Technique: Formant Synthesis Implementation: Praat Script Category: Speech/Vocal Synthesis Version: 1.0 (2025) License: MIT License

Contents:

What this does Quick start Formant Theory Preset Transitions Formant Structure Spatial Modes Sonic Applications

What this does

This script implements dynamic vowel transition synthesis — a formant‑based approach to creating vowel sounds and smoothly morphing between them. Rather than recording actual speech, it synthesizes vowel qualities by controlling three key resonant frequencies (formants F1, F2, F3) that define vowel identity. The script offers 15 preset transitions (from natural vowel progressions like A→I to extreme "alien" formant movements) and 6 spatial processing modes that distribute formants across the stereo field. Each vowel is created by mixing four sine waves: a fundamental pitch (around 120 Hz for male‑like speech) plus three formant‑tracking frequencies that linearly interpolate between start and end values over the sound's duration.

Key Features:

15 Vowel Transition Presets — Natural speech vowels (A, I, U), singing, whispering, robotic, alien, and spectral morphs
Three‑Formant Synthesis — Controls F1 (first formant), F2 (second formant), F3 (third formant) for precise vowel quality
6 Spatial Processing Modes — Mono, stereo separation, rotating formants, binaural, wide field, and panning morph
Real‑time Formant Interpolation — Smooth linear transitions between formant sets
Built‑in Amplitude Shaping — Natural amplitude envelopes and subtle tremolo
No External Samples — Pure additive synthesis via Praat's formula interpreter

What are formants and why do they matter? Formants are resonant frequencies of the vocal tract that amplify certain harmonics of the vocal fold vibration. Different vowel sounds are distinguished primarily by their first two formant frequencies (F1 and F2):

F1 (First formant): Related to mouth opening height — low F1 = closed vowel (like /i/ "ee"), high F1 = open vowel (like /a/ "ah")
F2 (Second formant): Related to tongue position front‑back — high F2 = front vowel (/i/), low F2 = back vowel (/u/ "oo")
F3 (Third formant): Adds nuance, important for /r/ sounds and vocal timbre

By controlling just these three frequencies (plus a fundamental pitch), we can synthesize recognizable vowel sounds. Transitioning between formant sets creates the illusion of vocal movement without actual articulation.

Technical Implementation: The script creates a four‑oscillator additive synthesis formula: (1) A fixed‑frequency oscillator for the fundamental (≈120 Hz). (2‑4) Three oscillators whose frequencies linearly interpolate from start to end formant values: freq(t) = start_f + (end_f - start_f) × (t/duration). These are mixed with different amplitudes (0.4, 0.6, 0.5, 0.3) approximating formant bandwidths. The result is multiplied by a gentle amplitude envelope (0.8 + 0.2×sin) and an exponential decay. For complex presets (All_vowels, Vowel_Cycle, Vocal_Journey), the formants themselves are modulated by low‑frequency oscillators, creating continuous vowel cycling rather than simple linear transitions.

Quick start

In Praat, ensure no objects are selected.
Run script… → dynamic_vowel_transitions.praat.
Choose a Preset (A_to_I, Singing_Vowels, Alien_Vowels, etc.) or Custom.
Select a Spatial_mode (Mono, Stereo_Voice, Rotating_Formants, etc.).
Adjust Duration (typically 1‑5 seconds for transitions).
Click OK — script synthesizes vowel transition, applies spatial processing, plays result.
Output sound appears in Objects list with descriptive name.

Quick tip: Start with A_to_I preset (3 s duration) to hear a classic vowel transition from "ah" to "ee". Try Singing_Vowels with Rotating_Formants spatial mode for a choral effect. Alien_Vowels with Wide_Transition creates extreme, non‑human vocalizations. For continuous vowel cycling, use Vowel_Cycle or All_vowels presets (these use internal modulation rather than linear transitions). Shorter durations (1‑2 s) create quick articulations; longer durations (5 s+) create slow, morphing textures. Watch the Info window for confirmation of preset selection.

Important: SIMPLIFIED SYNTHESIS — Real speech has time‑varying formant bandwidths, nasal zeros, aspiration noise, and more subtle features. This script produces "stylized" vowels good for sound design but not naturalistic speech synthesis. Formant frequencies are approximate — based on average male vocal tract; for female/child voices, multiply by ~1.2‑1.5. Linear interpolation between formants may sound artificial compared to natural articulatory trajectories. No consonant transitions — this is pure vowel‑to‑vowel movement without plosives, fricatives, or nasals. Spatial processing filters may introduce phase issues if summed to mono. The script uses fixed amplitudes for formants — in real speech, formant amplitudes vary with frequency.

Formant Synthesis Theory

The Source‑Filter Model

🎤 Speech Production Model

Two‑component model:

Speech sound = SOURCE × FILTER SOURCE: Glottal vibration (vocal folds) • Fundamental frequency f₀ (pitch) • Harmonic series: f₀, 2f₀, 3f₀, ... FILTER: Vocal tract resonances (formants) • Transfer function with peaks at F1, F2, F3, ... • Shapes which harmonics are amplified/attenuated In this script: Source = 120 Hz sine wave (simplified) Filter = three resonant peaks at F1(t), F2(t), F3(t) Implementation: Add sine waves at f₀, F1, F2, F3

Formant Frequency Ranges

🗺️ Vowel Quadrant (F1‑F2 Space)

    High F2 (Front)          Low F2 (Back)
         ↑                         ↑
F1 250 ┌─────────────────────┐   /i/ "ee" (270, 2290)
      │                         │
      │                         │
o 500 │                         │
p     │                         │
e 750 │                         │   /u/ "oo" (300, 870)
n     │                         │
      │                         │
    1000 └─────────────────────┘   /a/ "ah" (730, 1090)
         Front vowels          Back vowels

Cardinal vowels in script (average male):

/a/ "ah": F1=730 Hz, F2=1090 Hz, F3=2440 Hz
/i/ "ee": F1=270 Hz, F2=2290 Hz, F3=3010 Hz
/u/ "oo": F1=300 Hz, F2=870 Hz, F3=2240 Hz

Vowel identity determined primarily by F1/F2 relationship.

Synthesis Equation

# General formula for linear formant transition: y(t) = A₀·sin(2π·f₀·t) + A₁·sin(2π·[F1_start + (F1_end-F1_start)·(t/duration)]·t) + A₂·sin(2π·[F2_start + (F2_end-F2_end)·(t/duration)]·t) + A₃·sin(2π·[F3_start + (F3_end-F3_start)·(t/duration)]·t) # With amplitude shaping: envelope(t) = (0.8 + 0.2·sin(2π·0.3·t)) · exp(-0.2·t/duration) # Final output: output(t) = y(t) · envelope(t) Where: f₀ = 120 Hz (fundamental pitch, approximate male speech) A₀=0.4, A₁=0.6, A₂=0.5, A₃=0.3 (relative formant amplitudes) F1_start, F2_start, F3_start = starting formant frequencies F1_end, F2_end, F3_end = ending formant frequencies duration = total sound length in seconds

Why Three Formants?

Formant roles in vowel perception: F1 (First formant): • Frequency range: 200‑1000 Hz • Perception: "Openness" or "height" • Low F1 → closed vowel (/i/, /u/) • High F1 → open vowel (/a/, /æ/) F2 (Second formant): • Frequency range: 600‑2500 Hz • Perception: "Frontness" or "backness" • High F2 → front vowel (/i/, /e/) • Low F2 → back vowel (/u/, /o/) F3 (Third formant): • Frequency range: 1800‑3500 Hz • Perception: Timbre, /r/ sounds, vocal quality • Lower F3 → "darker" or "r‑colored" sounds • Higher F3 → "brighter" sounds Higher formants (F4, F5): • Affect personal voice quality ("speaker identity") • Not essential for vowel identification • This script uses only F1‑F3 for simplicity

Complete Signal Flow

STEP 1: PRESET SELECTION Determine start_f1, start_f2, start_f3, end_f1, end_f2, end_f3 Special presets (5,6,16) use internal modulation formulas STEP 2: FORMULA CONSTRUCTION IF preset is 5, 6, or 16: Use hard‑coded complex formula with modulating formants ELSE: Build linear interpolation formula: f1(t) = start_f1 + (end_f1-start_f1)*(t/duration) f2(t) = start_f2 + (end_f2-start_f2)*(t/duration) f3(t) = start_f3 + (end_f3-start_f3)*(t/duration) Combine with fundamental (120 Hz) and amplitudes STEP 3: AMPLITUDE ENVELOPE Multiply by: • Gentle tremolo: 0.8 + 0.2*sin(2π*0.3*t) • Exponential decay: exp(-0.2*t/duration) (Special presets use different envelope shapes) STEP 4: CREATE SOUND FROM FORMULA Praat evaluates formula sample‑by‑sample STEP 5: SPATIAL PROCESSING Based on spatial_mode: Mono: keep as is Stereo: split formants across channels Rotating: apply panning LFOs etc. STEP 6: PLAY AND RENAME Play result, rename descriptively

Preset Transitions

Natural Speech Vowels

Preset 2: A_to_I

🔤 "ah" → "ee" Transition

Formant movement:

F1: 730 Hz → 270 Hz (decreases — mouth closes)
F2: 1090 Hz → 2290 Hz (increases — tongue moves forward)
F3: 2440 Hz → 3010 Hz (increases slightly)

Articulation: Open back vowel to close front vowel. Natural progression found in words like "price" or "time".

Sonic character: Classic vowel transition, recognizable as speech‑like. Good baseline for understanding formant synthesis.

Duration suggestion: 2‑3 seconds for clear transition.

Preset 3: I_to_U

🔤 "ee" → "oo" Transition

Formant movement:

F1: 270 Hz → 300 Hz (slight increase)
F2: 2290 Hz → 870 Hz (dramatic decrease — tongue retracts)
F3: 3010 Hz → 2240 Hz (decreases — darker timbre)

Articulation: Close front vowel to close back vowel. Extreme tongue movement while keeping jaw relatively closed.

Sonic character: Bright → dark transition. Sounds like "we" → "who" without consonants.

Preset 4: U_to_A

🔤 "oo" → "ah" Transition

Formant movement:

F1: 300 Hz → 730 Hz (increases — jaw opens)
F2: 870 Hz → 1090 Hz (increases slightly)
F3: 2240 Hz → 2440 Hz (increases slightly)

Articulation: Close back vowel to open back vowel. Jaw opens while tongue stays back.

Sonic character: Dark → bright transition completing the vowel triangle (A‑I‑U).

Specialized Vowel Types

Preset 5: All_vowels

🎭 Continuous Vowel Cycling

Implementation: Not linear transition — formants modulated by LFOs:

F1: 400 + 400*sin(2π*0.25*t) # Cycles 0‑800 Hz at 0.25 Hz F2: 1000 + 1500*(0.5+0.5*sin(2π*0.2*t)) # 250‑1750 Hz at 0.2 Hz F3: 2000 + 1500*(0.5+0.5*sin(2π*0.15*t)) # 1250‑2750 Hz at 0.15 Hz

Effect: Formants independently sweep through vowel space, creating ever‑changing vowel quality that visits many points in F1‑F2 space.

Sonic character: "Talking without words" — continuous vowel morphing that never settles. Hypnotic, speech‑like but non‑linguistic.

Preset 6: Vowel_Cycle

🔄 Faster, Rhythmic Cycling

Implementation: Similar to All_vowels but faster LFO rates:

F1: 400 + 400*sin(2π*0.5*t) # 0.5 Hz (2‑second cycle) F2: 1000 + 1500*(0.5+0.5*sin(2π*0.4*t)) # 0.4 Hz F3: 2000 + 1500*(0.5+0.5*sin(2π*0.3*t)) # 0.3 Hz

Effect: Faster vowel transitions with pronounced 2 Hz amplitude tremolo. More rhythmic, less ambient than All_vowels.

Sonic character: Pulsing, rhythmic vowel changes. Almost like vowel‑based "sequencer".

Preset 7: Formant_Glissando

🎢 Extreme Formant Sweeps

Formant movement:

F1: 200 Hz → 1000 Hz (huge increase)
F2: 800 Hz → 3000 Hz (extreme increase)
F3: 2000 Hz → 4000 Hz (large increase)

Effect: All formants sweep upward dramatically, far beyond normal speech range.

Sonic character: Sci‑fi "beam" sounds, synthetic sirens, or extreme vocal effects. Not speech‑like.

Preset 8: Whisper_Transition

👂 Breathy, Aspirated Vowels

Formant movement:

F1: 600 Hz → 400 Hz (moderate change)
F2: 1200 Hz → 1800 Hz (fronting)
F3: 2400 Hz → 2800 Hz (brightening)

Note: True whispering requires noise source and different spectral balance. This preset simulates whispered quality through formant choices.

Sonic character: Intimate, breathy vocal quality. Like whispered speech without consonants.

Preset 9: Singing_Vowels

🎶 Musical Vowel Transitions

Formant movement:

F1: 550 Hz → 350 Hz
F2: 1100 Hz → 2000 Hz
F3: 2350 Hz → 3000 Hz

Characteristics: Formant values optimized for singing clarity (slightly different from speech). Smoother transitions, sustained tones.

Sonic character: Choral or operatic vowel transitions. Works well with longer durations (4‑6 s).

Preset 10: Robot_Speech

🤖 Mechanical, Regular Transitions

Formant movement:

F1: 400 Hz → 500 Hz (small, precise change)
F2: 1200 Hz → 1500 Hz (regular increase)
F3: 2400 Hz → 2600 Hz (small increase)

Characteristics: All formants move in same direction with similar slopes. Unnatural but systematic.

Sonic character: Synthetic, robotic vocalization. Like text‑to‑speech without consonants.

Experimental & Extreme

Preset 11: Alien_Vowels

👽 Non‑Human Vocal Tract

Formant movement:

F1: 150 Hz → 800 Hz (unusually low start)
F2: 3000 Hz → 1200 Hz (starts extremely high)
F3: 4000 Hz → 3500 Hz (very high throughout)

Characteristics: Formant relationships impossible for human vocal tract (F2 > F3 at start). Creates "alien" or "creature" vocalizations.

Sonic character: Sci‑fi creature sounds, non‑terrestrial life forms. Unsettling yet vocal‑like.

Preset 16: Vocal_Journey

🌌 Slow, Deep Morphing

Implementation: Complex internal modulation (not linear):

F1: 300 + 600*sin(2π*0.1*t) # Very slow (10‑second cycle) F2: 800 + 1800*(0.5+0.5*sin(2π*0.08*t)) # Even slower F3: 2200 + 1000*(0.5+0.5*sin(2π*0.12*t)) # Different rate

Effect: Extremely slow formant movement (cycles 8‑12 seconds). Deep, evolving vowel space exploration.

Sonic character: Meditative, slowly transforming vocal texture. Good for long‑duration ambient pieces.

All Presets Summary

Preset	F1 Range	F2 Range	F3 Range	Character	Duration
A_to_I	730→270	1090→2290	2440→3010	Natural speech	2‑3 s
I_to_U	270→300	2290→870	3010→2240	Bright→dark	2‑3 s
U_to_A	300→730	870→1090	2240→2440	Dark→open	2‑3 s
All_vowels	0‑800*	250‑1750*	1250‑2750*	Continuous cycle	5‑10 s
Vowel_Cycle	0‑800*	250‑1750*	1250‑2750*	Rhythmic cycle	3‑6 s
Formant_Glissando	200→1000	800→3000	2000→4000	Extreme sweeps	3‑5 s
Whisper_Transition	600→400	1200→1800	2400→2800	Breathy	2‑4 s
Singing_Vowels	550→350	1100→2000	2350→3000	Musical	3‑6 s
Robot_Speech	400→500	1200→1500	2400→2600	Mechanical	2‑3 s
Alien_Vowels	150→800	3000→1200	4000→3500	Non‑human	3‑5 s
Choral_Shift	500→600	1000→1400	2200→2600	Choir‑like	3‑5 s
Spectral_Morph	300→700	900→1800	2100→2900	Complex movement	3‑5 s
Harmonic_Transition	450→350	1300→1600	2500→2700	Harmonic focus	2‑4 s
Formant_Sweep	250→850	700→2400	1800→3200	Wide range	4‑6 s
Vocal_Journey	0‑900*	0‑2600*	1700‑2700*	Slow evolution	10‑30 s

* Modulated ranges, not linear transitions

Formant Structure & Implementation

Four‑Oscillator Architecture

🎛️ Oscillator Roles and Amplitudes

Oscillator 1: Fundamental (f₀)

Frequency: Fixed 120 Hz (approximate male pitch)
Amplitude: 0.4 (relative scale)
Purpose: Provides pitch perception, harmonic basis

Oscillator 2: First Formant (F1)

Frequency: Varies by preset (200‑1000 Hz range)
Amplitude: 0.6 (strongest — F1 typically most prominent)
Purpose: Determines vowel "openness"

Oscillator 3: Second Formant (F2)

Frequency: Varies by preset (600‑3000 Hz range)
Amplitude: 0.5
Purpose: Determines vowel "frontness/backness"

Oscillator 4: Third Formant (F3)

Frequency: Varies by preset (1800‑4000 Hz range)
Amplitude: 0.3 (weakest — F3 contributes to timbre)
Purpose: Adds vocal quality, brightness

Time‑Varying Frequencies

# Linear interpolation (most presets): f1(t) = start_f1 + (end_f1 - start_f1) * (t / duration) f2(t) = start_f2 + (end_f2 - start_f2) * (t / duration) f3(t) = start_f3 + (end_f3 - start_f3) * (t / duration) # Example: A_to_I, duration=3 s, at t=1.5 s: f1(1.5) = 730 + (270-730) * (1.5/3) = 730 + (-460)*0.5 = 500 Hz f2(1.5) = 1090 + (2290-1090) * 0.5 = 1090 + 600 = 1690 Hz f3(1.5) = 2440 + (3010-2440) * 0.5 = 2440 + 285 = 2725 Hz # This creates smooth, linear formant glides

Amplitude Envelopes

Default envelope (linear transition presets): envelope(t) = (0.8 + 0.2*sin(2π*0.3*t)) * exp(-0.2*t/duration) Components: 1. Gentle tremolo: 0.8 + 0.2*sin(2π*0.3*t) • 0.3 Hz (3.33‑second cycle) sine wave • Adds subtle amplitude variation (20% depth) • Makes static synthesis more "alive" 2. Exponential decay: exp(-0.2*t/duration) • Very slow decay over entire duration • At t=duration: exp(-0.2) ≈ 0.82 (18% reduction) • Prevents abrupt end, natural fade‑out Special preset envelopes: • All_vowels: exp(-0.1*t/duration) (slower decay) • Vowel_Cycle: (0.8 + 0.2*sin(2π*2.0*t)) (fast 2 Hz tremolo) • Vocal_Journey: (0.6 + 0.4*sin(2π*0.1*t)) (slow, deep tremolo)

Natural Speech vs. Synthesis Simplifications

What's missing from real speech:

Formant bandwidths: Real formants have width (Q) — this uses pure sine waves
Nasal zeros: Nasalized vowels have anti‑resonances
Aspiration noise: Breathiness, especially in whispers
Glottal source shape: Real vocal folds produce pulse train with specific spectrum
Dynamic amplitudes: Formant amplitudes change with frequency
Higher formants: F4 (≈3500 Hz), F5 (≈4500 Hz) affect voice quality
Non‑linear interactions: Source‑filter coupling

Why these simplifications work: For vowel perception, F1‑F2 relationship is most critical. Pure sine waves at formant frequencies create clear spectral peaks. The simplifications make the synthesis tractable in Praat's formula language while preserving essential vowel qualities.

Spatial Processing Modes

Mono (Mode 1)

🔈 Single‑Channel Output

Processing: No stereo processing — keeps synthesized mono sound as is.

Effect: Centered, focused vocal sound. Good for further processing or when stereo imaging would be distracting.

Output name: vowel_transition_mono

Use case: Basic synthesis, further effects processing, mono‑compatible applications.

Stereo Voice (Mode 2)

🎧 Formant Frequency Separation

Processing:

Left channel: Low‑pass filtered (0‑2000 Hz, Hann window, 100 Hz smoothing) + 0.9× gain
Right channel: Band‑pass filtered (150‑4000 Hz, Hann window, 100 Hz smoothing) + 0.9× gain

Effect: Lower formants (F1, some F2) emphasized in left ear; higher formants (F2, F3) in right ear. Creates natural stereo width.

Output name: vowel_transition_stereo

Use case: Natural‑sounding vocal placement, standard stereo mixing.

Rotating Formants (Mode 3)

🌀 Circular Panning Motion

Processing: Sine/cosine panning at 0.15 Hz (6.67‑second rotation period):

Left: self * (0.5 + 0.4 * cos(2π * 0.15 * x)) Right: self * (0.5 + 0.4 * sin(2π * 0.15 * x))

Effect: Vowel sound appears to slowly rotate around listener's head. Creates immersive, 3D sensation.

Output name: vowel_transition_rotating

Use case: Meditative/ambient pieces, spatial audio experiments, VR/AR applications.

Binaural Vowels (Mode 4)

🧠 Differentiated Ear Processing

Processing:

Left channel: Band‑pass 80‑3000 Hz + amplitude modulation (0.8 + 0.1*sin(2π*0.2*x))
Right channel: Band‑pass 100‑3500 Hz + different AM (0.7 + 0.2*cos(2π*0.25*x))

Effect: Each ear receives differently filtered and modulated version, creating complex binaural interaction. Can produce phantom center images.

Output name: vowel_transition_binaural

Use case: Headphone listening, binaural audio research, intimate vocal effects.

Wide Transition (Mode 5)

🌐 Extreme Frequency Separation

Processing:

Left channel: Very low‑pass (0‑1500 Hz, 120 Hz smoothing) + 0.8× gain
Right channel: High‑pass/band‑pass (200‑5000 Hz, 120 Hz smoothing) + 0.8× gain

Effect: Bass frequencies strongly left, treble strongly right. Creates extreme stereo width but can sound unnatural.

Output name: vowel_transition_wide

Use case: Experimental music, special effects, when maximum stereo separation desired.

Panning Morph (Mode 6)

🏓 Panning Follows Formant Movement

Processing: Panning evolves linearly with time:

Left: self * (0.6 + 0.3 * (1 - x/duration)) # Starts strong, fades Right: self * (0.6 + 0.3 * (x/duration)) # Starts weak, grows

Effect: Sound moves from left to right as formants transition. Creates correlation between spectral change and spatial movement.

Output name: vowel_transition_panning

Use case: Sound design where spatial movement reinforces spectral transition, moving sound effects.

Sonic Applications

Speech Synthesis & Voice Design

🗣️ Building Block for Vocal Synthesis

Creating diphthongs: Chain multiple transitions (A→I then I→U) to create complex vowel sequences.

Voice character design: Adjust formant frequencies to create different "voices":

Child voice: Multiply all formants by 1.3‑1.5, increase f₀ to 200‑300 Hz
Female voice: Multiply formants by 1.15‑1.25, f₀=180‑220 Hz
Male voice: Default settings (f₀=120 Hz)
Elderly voice: Slightly lower F1, more prominent F3

Adding consonants: Layer with noise bursts (for fricatives), silence gaps (for plosives), nasal resonances.

Sound Design for Media

Sci‑fi interfaces: Robot_Speech preset with short duration (0.5‑1 s) for button presses, status alerts.

Alien creatures: Alien_Vowels with Vocal_Journey envelope, slowed down, layered with animal sounds.

Magical spells: Formant_Glissando with long duration, reversed, with reverb and pitch modulation.

Horror vocals: Whisper_Transition with extreme slowing (duration 10 s), low‑pass filtering, subtle distortion.

Music Production

Vocal pads: All_vowels or Vocal_Journey presets with long duration (10‑30 s), heavy reverb, slow filter sweeps.

Rhythmic vocal hits: A_to_I or U_to_A with short duration (0.2‑0.5 s), percussive envelope, side‑chain compression.

Choral textures: Singing_Vowels with multiple instances at different pitches (harmonized), Rotating_Formants spatial mode.

Glitch vocals: Vowel_Cycle preset chopped into 0.1 s segments, rearranged rhythmically, bit‑crushed.

Experimental & Educational

Formant perception demonstrations: Play A_to_I transition while displaying F1/F2 plot. Show how vowel perception changes continuously.

Vowel space exploration: Create grid of sounds covering F1‑F2 space to demonstrate vowel continuum.

Synthesized language: Create "words" by chaining vowel transitions with different durations and pitches.

Cross‑species communication: Use Alien_Vowels as basis for designing non‑human vocal communication systems.

Practical Workflow Examples

🎬 Film: "AI Companion Voice"

Character: Friendly artificial intelligence

Voice design:

Base vowel set: Singing_Vowels preset (clear, musical)
Spatial mode: Binaural_Vowels (intimate, headphone‑friendly)
Pitch variation: Create multiple instances at different f₀ (120, 180, 240 Hz)
Consonant simulation: Add filtered noise bursts before vowels
Prosody: Vary duration (short for quick responses, long for explanations)

Result: Synthetic yet warm vocal quality suitable for AI character.

🎵 Track: "Vocal Ambient" (Music Production)

Structure:

Layer 1 (Drone): Vocal_Journey, 60 s, Mono, heavily reverbed
Layer 2 (Rhythm): Vowel_Cycle, 4 s, chopped to 1‑beat segments
Layer 3 (Melody): A_to_I at different pitches (following chord progression)
Layer 4 (Texture): Whisper_Transition, reversed, stereo‑widened

Processing: Global side‑chain compression, tape saturation, stereo imaging.

🔬 Research: "Vowel Continuum Perception"

Experiment: Create continuum between /a/ and /i/ with 10 equal steps

Procedure:

Calculate intermediate formant values:

F1_step = (730-270)/9 = 51.1 Hz F2_step = (2290-1090)/9 = 133.3 Hz

Generate 10 sounds with formants: (730,1090), (679,1223), ..., (270,2290)
Present to listeners in random order, ask to identify as /a/ or /i/
Plot identification curve → find categorical boundary

Application: Speech perception research, categorical perception demonstration.

Advanced Techniques & Customization

Modifying Formant Values

For different voice types: Modify the start/end formant values in the script:

# Female voice (approx 20% higher formants): start_f1 = 730 * 1.2 # → 876 Hz start_f2 = 1090 * 1.2 # → 1308 Hz # etc. # Child voice (approx 40% higher): start_f1 = 730 * 1.4 # → 1022 Hz # etc. # Note: Also adjust fundamental frequency (f₀ in formula)

Non‑Linear Transitions

Replace linear interpolation with curves:

# Exponential transition (slow start, fast end): progress = (exp(3*x/duration) - 1) / (exp(3) - 1) f1(t) = start_f1 + (end_f1 - start_f1) * progress # Sigmoid transition (slow start and end, fast middle): progress = 1/(1 + exp(-10*(x/duration - 0.5))) f1(t) = start_f1 + (end_f1 - start_f1) * progress # These require modifying the formula construction section

Adding More Formants

Extend to F4 and F5 for richer timbre:

# Add to formula: + 0.2*sin(2*pi*(3500 + (4000-3500)*(x/duration))*x) # F4 + 0.1*sin(2*pi*(4500 + (5000-4500)*(x/duration))*x) # F5 # Typical values: # F4: 3500‑4000 Hz (voice quality, "twang") # F5: 4500‑5000 Hz (brightness, "singer's formant")

Creating Custom Presets

Add new preset to script:

elsif preset = 17 # My Custom Transition start_f1 = 400 start_f2 = 1200 start_f3 = 2600 end_f1 = 600 end_f2 = 1800 end_f3 = 3200 transition_name$ = "My_Custom" # Also add to optionmenu in form: # option My_Custom

Combining with Other Praat Features

Pitch manipulation: Use Praat's "Manipulation" object to change f₀ contour after synthesis.

Formant tracking & modification: Synthesize vowel, extract formants with "To Formant (burg)", modify, resynthesize.

Layering with natural speech: Mix synthesized vowels with recorded speech for hybrid vocal effects.

Troubleshooting

Problem: Vowels sound artificial or "beepy"
Cause: Pure sine waves at formant frequencies create overly precise spectral peaks
Solution: Add slight frequency modulation to each oscillator, or use band‑passed noise instead of sine

Problem: Transition sounds abrupt or mechanical
Cause: Linear interpolation may be too regular
Solution: Use longer duration, or modify script for non‑linear (s‑curve) interpolation

Problem: Spatial modes cause phase cancellation in mono
Cause: Different filtering in left/right channels creates phase differences
Solution: Use Mono mode for mono compatibility, or adjust filter slopes to be more phase‑linear

Problem: Very long durations cause Praat slowdown
Cause: Complex formula evaluated at every sample
Solution: Reduce duration, or synthesize in segments and concatenate