Voice Quality Sonification — Jitter-Shimmer Formant Mapper — User Guide

Data sonification tool: maps voice perturbation measurements (jitter and shimmer) to formant frequencies, creating auditory representations of vocal quality changes over time.

Author: Shai Cohen Version: 1.0 (2025) Technique: Voice Analysis → Formant Mapping Application: Praat scripting language

Contents:

What this does Quick start Voice Quality Theory Preset Strategies Parameters & Controls Visualization & Analysis Applications

What this does

This script implements a Jitter-Shimmer Formant Mapper — a voice quality sonification tool that analyzes perturbation measurements (jitter and shimmer) from a voice recording and uses these measurements to dynamically control formant filtering. The result is an auditory representation of vocal quality variations: as jitter and shimmer change over time, the perceived vowel quality shifts accordingly.

Key Features:

5 Preset Mapping Strategies — Custom, Subtle, Moderate, Extreme, Reverse
Multi-Window Analysis — Divides recording into analysis windows for time-varying measurement
Jitter Analysis — Measures cycle-to-cycle variations in fundamental frequency
Shimmer Analysis — Measures cycle-to-cycle variations in amplitude
Formant Mapping — Jitter controls F1 (first formant), Shimmer controls F2 (second formant)
Formant Range Control — User-specified min/max frequencies for F1 and F2
Visualization — Plots normalized jitter/shimmer curves over time
Segmented Processing — Applies time-varying formant filters to each analysis window

🗣️ What are Jitter and Shimmer?

Jitter (frequency perturbation) — the cycle-to-cycle variation in fundamental frequency. Measured as a percentage: 0.5-1.0% is typical for healthy voices; higher values indicate roughness, hoarseness, or pathology.

Shimmer (amplitude perturbation) — the cycle-to-cycle variation in amplitude. Typical values: 2.0-4.0% for healthy voices; higher values indicate breathiness, weakness, or pathology.

Together, jitter and shimmer are key acoustic parameters for assessing vocal quality. This tool sonifies these measurements by mapping them to formant frequencies — turning quantitative voice analysis into audible vowel changes.

Technical Implementation: (1) Window Analysis: Divide recording into N analysis windows. (2) Jitter/Shimmer Extraction: For each window, extract pitch, create point process, generate voice report, parse jitter and shimmer percentages. (3) Normalization: Scale jitter/shimmer values to 0-1 range across all windows. (4) Formant Calculation: For each window, compute shifted formant frequencies: F1 = base_F1 × (1 + (jitter_norm - 0.5) × scale × 2), F2 = base_F2 × (1 + (shimmer_norm - 0.5) × scale × 2). (5) Segment Processing: Extract each segment, apply bandpass filters at shifted formant frequencies, mix F1 and F2 bands, add to output. (6) Visualization: Plot normalized jitter (blue) and shimmer (red) curves over time.

Quick start

In Praat, select exactly one Sound object (speech/voice recording, mono recommended).
Run script… → select Jitter_Shimmer_Formant_Mapper.praat.
Choose Preset (2-5 for specific mapping strategies, 1 for custom).
Set analysis parameters: Num_windows and Window_size.
Adjust mapping scales: Jitter_to_F1_scale and Shimmer_to_F2_scale.
Set formant frequency ranges (F1_low/high, F2_low/high).
Enable Draw_analysis for visualization of jitter/shimmer curves.
Click OK — processor analyzes voice, maps to formants, creates "original_jitshim" sound object.

Quick tip: Start with Moderate Effect preset on a sustained vowel recording (2-5 seconds). Enable Draw_analysis to see how jitter (blue) and shimmer (red) vary over time. Listen to the output — you'll hear vowel quality shifting as jitter and shimmer change. For clinical voice assessment, use Subtle preset to highlight pathological variations. For sound design, try Extreme or Reverse mapping for dramatic formant shifts.

Important: VOICE REQUIREMENTS — Input must contain voiced segments with detectable pitch. Unvoiced material (whispers, noise, silence) will yield default jitter/shimmer values (0.5%, 3.0%). WINDOW SIZE — Must be large enough to contain multiple pitch periods. For low voices (80-150 Hz), use window_size ≥ 0.2s; for high voices (200-400 Hz), 0.1s may suffice. MINIMUM DURATION — Sound must be longer than 2 × window_size. FORMANT RANGES — Should be appropriate for the speaker (male: F1 270-800 Hz, F2 800-2500 Hz; female: F1 310-1000 Hz, F2 1000-3200 Hz). FILTER SMOOTHING — Fixed at 100 Hz; narrow formant ranges may require adjustment.

Voice Quality Theory

Jitter and Shimmer

Jitter (local) — percentage of cycle-to-cycle F0 variation: Jitter% = (1/(N-1)) × Σ|T_i - T_{i-1}| / (1/N) × ΣT_i × 100% Where: T_i = period of cycle i N = number of cycles Typical ranges: • 0.2-0.8% — Very stable voice (ideal) • 0.8-1.5% — Typical healthy voice • 1.5-3.0% — Slight roughness • 3.0%+ — Pathological (hoarse, rough) Shimmer (local) — percentage of cycle-to-cycle amplitude variation: Shimmer% = (1/(N-1)) × Σ|A_i - A_{i-1}| / (1/N) × ΣA_i × 100% Typical ranges: • 1.0-2.5% — Very stable voice (ideal) • 2.5-4.0% — Typical healthy voice • 4.0-7.0% — Slight breathiness • 7.0%+ — Pathological (breathy, weak)

Formant Frequencies

🎵 What are Formants?

Formants are resonant frequencies of the vocal tract that shape vowel quality. The first two formants (F1 and F2) primarily determine vowel identity:

Vowel	F1 (Hz) - Male	F2 (Hz) - Male	F1 (Hz) - Female	F2 (Hz) - Female
/i/ (heed)	270	2300	310	2800
/ɪ/ (hid)	400	2000	430	2500
/ɛ/ (head)	550	1800	600	2200
/æ/ (had)	700	1700	800	2000
/ɑ/ (father)	750	1200	850	1400
/ɔ/ (bought)	500	900	550	1100
/ʊ/ (hood)	400	1000	450	1200
/u/ (who'd)	300	900	350	1100

F1 inversely related to tongue height: High vowels (i,u) have low F1; low vowels (æ,ɑ) have high F1.

F2 related to tongue frontness: Front vowels (i,ɪ) have high F2; back vowels (u,ɔ) have low F2.

Mapping Function

For each analysis window: 1. Normalize jitter and shimmer to 0-1 range: jitter_norm = (jitter - min_jitter) / (max_jitter - min_jitter) shimmer_norm = (shimmer - min_shimmer) / (max_shimmer - min_shimmer) 2. Calculate formant shift factors: f1_shift = 1.0 + (jitter_norm - 0.5) × jitter_scale × 2 f2_shift = 1.0 + (shimmer_norm - 0.5) × shimmer_scale × 2 The (x - 0.5) × 2 maps 0-1 range to -1 to +1. jitter_scale determines maximum deviation from unity. 3. Apply shifts to formant ranges: f1_low_shifted = f1_low × f1_shift f1_high_shifted = f1_high × f1_shift f2_low_shifted = f2_low × f2_shift f2_high_shifted = f2_high × f2_shift Example: f1_low=280, f1_high=900, jitter_scale=0.15 jitter_norm = 0.0 → shift = 1.0 + (0-0.5)×0.15×2 = 1.0 - 0.15 = 0.85 → f1 range = 238-765 Hz (lowered) jitter_norm = 1.0 → shift = 1.0 + (1-0.5)×0.15×2 = 1.0 + 0.15 = 1.15 → f1 range = 322-1035 Hz (raised)

Reverse Mapping

🔄 Cross-Mapping Concept

Reverse Mapping preset swaps the mapping relationship:

Jitter → F2 (instead of F1)
Shimmer → F1 (instead of F2)
Scale values negative (-0.15, -0.12) — higher jitter/shimmer lowers formants

Perceptual effect: Instead of jitter affecting vowel height and shimmer affecting front/back position, the relationship is inverted and crossed. Creates "opposite" perceptual mapping, useful for exploring alternative sonification strategies or creating unusual timbral shifts.

Negative scale: When jitter_norm > 0.5, f2_shift < 1.0 (frequencies decrease) — high jitter maps to lower, darker formants.

Preset Strategies

Preset 2: Subtle Variation

🌱 Gentle Voice Quality Sonification

Windows: 6

Window size: 0.25 seconds

Jitter scale: 0.08 (8% max shift)

Shimmer scale: 0.06 (6% max shift)

Character: Subtle formant variations, barely perceptible under normal listening

Use on: Clinical voice assessment where minimal coloration is desired, research applications

Preset 3: Moderate Effect

🎵 Balanced Voice Sonification

Windows: 8

Window size: 0.2 seconds

Jitter scale: 0.15 (15% max shift)

Shimmer scale: 0.12 (12% max shift)

Character: Clearly audible formant shifts corresponding to voice quality changes

Use on: General purpose, education, demonstration of jitter/shimmer effects

Preset 4: Extreme Mapping

⚡ Dramatic Formant Shifts

Windows: 12

Window size: 0.15 seconds

Jitter scale: 0.30 (30% max shift)

Shimmer scale: 0.25 (25% max shift)

Character: Extreme vowel changes — jitter variations can shift F1 by ±30%, creating dramatic timbral transformations

Use on: Sound design, experimental music, extreme voice transformations

Preset 5: Reverse Mapping

🔄 Crossed + Inverted Mapping

Windows: 8

Window size: 0.2 seconds

Jitter scale: -0.15 (maps to F2, inverted)

Shimmer scale: -0.12 (maps to F1, inverted)

Character: Jitter controls F2 (front/back), shimmer controls F1 (height), both with inverse relationship

Use on: Exploring alternative sonification mappings, unusual timbral effects

Parameters & Controls

Analysis Parameters

Parameter	Type	Default	Description
Num_windows	integer	8	Number of analysis windows (2-20 typical)
Window_size (s)	positive	0.2	Duration of each analysis window (seconds)

Formant Mapping Parameters

Parameter	Type	Default	Description
Jitter_to_F1_scale	real	0.15	Mapping strength: jitter → F1 (±0.3 max typical)
Shimmer_to_F2_scale	real	0.12	Mapping strength: shimmer → F2 (±0.25 max typical)

Formant Range Parameters

Parameter	Type	Default	Description
F1_low (Hz)	positive	280	Minimum F1 frequency (lowest possible first formant)
F1_high (Hz)	positive	900	Maximum F1 frequency (highest possible first formant)
F2_low (Hz)	positive	900	Minimum F2 frequency (lowest possible second formant)
F2_high (Hz)	positive	2500	Maximum F2 frequency (highest possible second formant)

Output Parameters

Parameter	Type	Default	Description
Draw_analysis	boolean	1	Generate normalized jitter/shimmer plot
Play_result	boolean	1	Audition after processing

Parameter Interaction Guide

Jitter_norm range affects F1 as: F1_center = (F1_low + F1_high) / 2 F1_span = (F1_high - F1_low) / 2 Actual F1_range = [F1_center - F1_span × (1 + jitter_shift), F1_center + F1_span × (1 + jitter_shift)] Where jitter_shift = (jitter_norm - 0.5) × jitter_scale × 2 Example: F1_low=280, F1_high=900 → F1_center=590, F1_span=310 jitter_scale=0.15, jitter_norm=0.0 → jitter_shift = -0.15 → F1_range = 590 - 310×0.85 to 590 + 310×0.85 = 590 - 263.5 to 590 + 263.5 = 326.5 to 853.5 Hz Note: The bandpass filter uses the shifted range directly

Visualization & Analysis

Analysis Plot

Jitter-Shimmer Analysis Plot: X-axis: Time (seconds) Y-axis: Normalized value (0.0 to 1.2) Elements: • Blue line with circles: Normalized jitter values • Red line with circles: Normalized shimmer values • Gray horizontal line: 0.5 reference (mean of range) • Grid at 0.5 intervals • Legend: "Jitter -> F1" (Blue), "Shimmer -> F2" (Red) Interpretation: • Values above 0.5 = higher than average jitter/shimmer • Values below 0.5 = lower than average jitter/shimmer • Spacing of circles = analysis window positions • Slope between windows = rate of change in voice quality

Reading the Plot

What to look for:

Parallel curves: Jitter and shimmer often correlate — both increase with vocal effort or pathology
Diverging curves: Jitter high but shimmer low may indicate specific voice disorders (rough but stable amplitude)
Steep slopes: Rapid changes in voice quality at specific moments (onset of vocal fatigue, emotional shifts)
Flat lines: Stable voice quality throughout
Gaps: Unvoiced regions where default values (0.5 jitter, 0.5 shimmer) are assigned

Auditory Interpretation

What to listen for in the output:

Blue curve (jitter) → F1: When jitter increases (blue above 0.5), F1 rises → vowels become lower (e.g., /i/ → /ɪ/ → /ɛ/ → /æ/). Listen for vowel "opening" or lowering.
Red curve (shimmer) → F2: When shimmer increases (red above 0.5), F2 rises → vowels become more front (e.g., /u/ → /ʊ/ → /o/ → /ɑ/ → /æ/ → /ɛ/ → /ɪ/ → /i/). Listen for vowel "fronting".
Combined effect: Jitter and shimmer together trace a path through vowel space — the output "maps" voice quality changes onto vowel quality changes.

Applications

Clinical Voice Assessment

Use case: Auditory display of voice quality changes for clinical feedback

Technique: Subtle preset with patient's voice sample

Workflow:

Record patient sustaining vowel /a/ for 3-5 seconds
Apply Subtle preset (minimal coloration)
Listen to output while viewing analysis plot
Hear how jitter/shimmer variations affect perceived vowel — provides auditory biofeedback
Compare pre/post therapy recordings

Voice Science Education

Use case: Demonstrating jitter and shimmer concepts to students

Technique: Moderate preset with synthesized or recorded voice samples

Learning outcomes:

Hear how increased jitter (roughness) maps to vowel lowering
Hear how increased shimmer (breathiness) maps to vowel fronting
See the analysis plot correlate with audible changes
Understand voice perturbation as continuous parameter

Sound Design & Vocal Transformation

Use case: Creating evolving vocal textures for music/sound art

Technique: Extreme or Reverse preset with expressive vocal recordings

Applications:

Transform spoken word into vowel-space trajectories
Create "voice quality" controlled synthesis
Generate evolving pads from vocal samples
Experimental voice processing

Research Data Sonification

Use case: Converting voice perturbation measurements into audible form for pattern recognition

Technique: Custom preset with research-appropriate parameters

Advantages:

Humans can perceive temporal patterns in audio that may be missed in visual analysis
Multiple parameters (jitter, shimmer) mapped to perceptually distinct dimensions (F1, F2)
Enables "auditory display" of voice quality in real-time monitoring

Practical Workflow Examples

🔬 Clinical: Sustained Vowel Analysis

Goal: Sonify jitter/shimmer variations in patient's /a/ vowel

Settings:

Preset: Subtle (minimal coloration)
Num_windows: 6-8 (depending on duration)
Window_size: 0.25-0.3s (stable analysis)
F1 range: 280-900 Hz (male) or 310-1000 Hz (female)
F2 range: 900-2500 Hz (male) or 1000-3200 Hz (female)

Result: Output preserves original voice quality while making perturbation patterns audible as subtle vowel shifts

🎓 Educational: Demonstrating Jitter Effects

Goal: Show how jitter affects perceived vowel height

Settings:

Preset: Moderate
Jitter_to_F1_scale: 0.3 (exaggerated for demonstration)
Shimmer_to_F2_scale: 0.0 (disable shimmer mapping)
Source: Sustained /i/ vowel (high, front)

Result: As jitter varies, vowel shifts between /i/ (low jitter) and /ɛ/ (high jitter) — clearly audible height change

🎚️ Sound Design: Spoken Word Trajectory

Goal: Transform spoken phrase into evolving vowel space trajectory

Settings:

Preset: Extreme
Num_windows: 12-20 (high resolution)
Window_size: 0.15s (capture rapid changes)
F1 range: 200-1000 Hz (wide range for drama)
F2 range: 800-3000 Hz (wide range)

Result: Spoken words transformed into abstract vowel trajectory — each phoneme's voice quality mapped to formant shifts, creating evolving texture

Troubleshooting Common Issues

Problem: No output or silent sections
Cause: Unvoiced regions with default jitter/shimmer values producing inaudible formants
Solution: Ensure source has sustained voicing, or adjust formant ranges to be more audible

Problem: Jitter/shimmer values all identical
Cause: Pitch analysis failed (unvoiced, noise, or wrong pitch settings)
Solution: Check that recording is clearly voiced, adjust window_size larger

Problem: Output sounds like noise, not vowel
Cause: Formant ranges too narrow or shifted outside audible spectrum
Solution: Reset to default ranges, ensure jitter/shimmer scales aren't extreme

Problem: Abrupt transitions between segments
Cause: Too few windows causing large jumps in formant frequencies
Solution: Increase num_windows for smoother transitions

Problem: Processing very slow
Cause: Many windows, each requiring multiple filter operations
Solution: Reduce num_windows (6-10 typical), reduce window_size

Advanced Techniques

Custom formant ranges for specific speakers:

Male adult: F1: 270-800 Hz, F2: 800-2500 Hz
Female adult: F1: 310-1000 Hz, F2: 1000-3200 Hz
Child (8-10): F1: 400-1200 Hz, F2: 1200-4000 Hz
Soprano singer: F1: 350-1100 Hz, F2: 1100-3500 Hz
Bass singer: F1: 250-700 Hz, F2: 700-2200 Hz

Alternative mapping strategies (edit script):

Jitter only: Set shimmer_to_F2_scale = 0
Shimmer only: Set jitter_to_F1_scale = 0
Same scale for both: Equal values for balanced effect
Negative correlation: Use negative scales for inverse mapping
Extreme ranges: Set F1_low=200, F1_high=1200 for dramatic shifts