Voice Quality Sonification — Jitter-Shimmer Formant Mapper — User Guide
Data sonification tool: maps voice perturbation measurements (jitter and shimmer) to formant frequencies, creating auditory representations of vocal quality changes over time.
What this does
This script implements a Jitter-Shimmer Formant Mapper — a voice quality sonification tool that analyzes perturbation measurements (jitter and shimmer) from a voice recording and uses these measurements to dynamically control formant filtering. The result is an auditory representation of vocal quality variations: as jitter and shimmer change over time, the perceived vowel quality shifts accordingly.
Key Features:
- 5 Preset Mapping Strategies — Custom, Subtle, Moderate, Extreme, Reverse
- Multi-Window Analysis — Divides recording into analysis windows for time-varying measurement
- Jitter Analysis — Measures cycle-to-cycle variations in fundamental frequency
- Shimmer Analysis — Measures cycle-to-cycle variations in amplitude
- Formant Mapping — Jitter controls F1 (first formant), Shimmer controls F2 (second formant)
- Formant Range Control — User-specified min/max frequencies for F1 and F2
- Visualization — Plots normalized jitter/shimmer curves over time
- Segmented Processing — Applies time-varying formant filters to each analysis window
🗣️ What are Jitter and Shimmer?
Jitter (frequency perturbation) — the cycle-to-cycle variation in fundamental frequency. Measured as a percentage: 0.5-1.0% is typical for healthy voices; higher values indicate roughness, hoarseness, or pathology.
Shimmer (amplitude perturbation) — the cycle-to-cycle variation in amplitude. Typical values: 2.0-4.0% for healthy voices; higher values indicate breathiness, weakness, or pathology.
Together, jitter and shimmer are key acoustic parameters for assessing vocal quality. This tool sonifies these measurements by mapping them to formant frequencies — turning quantitative voice analysis into audible vowel changes.
Technical Implementation: (1) Window Analysis: Divide recording into N analysis windows. (2) Jitter/Shimmer Extraction: For each window, extract pitch, create point process, generate voice report, parse jitter and shimmer percentages. (3) Normalization: Scale jitter/shimmer values to 0-1 range across all windows. (4) Formant Calculation: For each window, compute shifted formant frequencies: F1 = base_F1 × (1 + (jitter_norm - 0.5) × scale × 2), F2 = base_F2 × (1 + (shimmer_norm - 0.5) × scale × 2). (5) Segment Processing: Extract each segment, apply bandpass filters at shifted formant frequencies, mix F1 and F2 bands, add to output. (6) Visualization: Plot normalized jitter (blue) and shimmer (red) curves over time.
Quick start
- In Praat, select exactly one Sound object (speech/voice recording, mono recommended).
- Run script… → select
Jitter_Shimmer_Formant_Mapper.praat. - Choose Preset (2-5 for specific mapping strategies, 1 for custom).
- Set analysis parameters: Num_windows and Window_size.
- Adjust mapping scales: Jitter_to_F1_scale and Shimmer_to_F2_scale.
- Set formant frequency ranges (F1_low/high, F2_low/high).
- Enable Draw_analysis for visualization of jitter/shimmer curves.
- Click OK — processor analyzes voice, maps to formants, creates "original_jitshim" sound object.
Voice Quality Theory
Jitter and Shimmer
Formant Frequencies
🎵 What are Formants?
Formants are resonant frequencies of the vocal tract that shape vowel quality. The first two formants (F1 and F2) primarily determine vowel identity:
| Vowel | F1 (Hz) - Male | F2 (Hz) - Male | F1 (Hz) - Female | F2 (Hz) - Female |
|---|---|---|---|---|
| /i/ (heed) | 270 | 2300 | 310 | 2800 |
| /ɪ/ (hid) | 400 | 2000 | 430 | 2500 |
| /ɛ/ (head) | 550 | 1800 | 600 | 2200 |
| /æ/ (had) | 700 | 1700 | 800 | 2000 |
| /ɑ/ (father) | 750 | 1200 | 850 | 1400 |
| /ɔ/ (bought) | 500 | 900 | 550 | 1100 |
| /ʊ/ (hood) | 400 | 1000 | 450 | 1200 |
| /u/ (who'd) | 300 | 900 | 350 | 1100 |
F1 inversely related to tongue height: High vowels (i,u) have low F1; low vowels (æ,ɑ) have high F1.
F2 related to tongue frontness: Front vowels (i,ɪ) have high F2; back vowels (u,ɔ) have low F2.
Mapping Function
Reverse Mapping
🔄 Cross-Mapping Concept
Reverse Mapping preset swaps the mapping relationship:
- Jitter → F2 (instead of F1)
- Shimmer → F1 (instead of F2)
- Scale values negative (-0.15, -0.12) — higher jitter/shimmer lowers formants
Perceptual effect: Instead of jitter affecting vowel height and shimmer affecting front/back position, the relationship is inverted and crossed. Creates "opposite" perceptual mapping, useful for exploring alternative sonification strategies or creating unusual timbral shifts.
Negative scale: When jitter_norm > 0.5, f2_shift < 1.0 (frequencies decrease) — high jitter maps to lower, darker formants.
Preset Strategies
Preset 2: Subtle Variation
🌱 Gentle Voice Quality Sonification
Windows: 6
Window size: 0.25 seconds
Jitter scale: 0.08 (8% max shift)
Shimmer scale: 0.06 (6% max shift)
Character: Subtle formant variations, barely perceptible under normal listening
Use on: Clinical voice assessment where minimal coloration is desired, research applications
Preset 3: Moderate Effect
🎵 Balanced Voice Sonification
Windows: 8
Window size: 0.2 seconds
Jitter scale: 0.15 (15% max shift)
Shimmer scale: 0.12 (12% max shift)
Character: Clearly audible formant shifts corresponding to voice quality changes
Use on: General purpose, education, demonstration of jitter/shimmer effects
Preset 4: Extreme Mapping
⚡ Dramatic Formant Shifts
Windows: 12
Window size: 0.15 seconds
Jitter scale: 0.30 (30% max shift)
Shimmer scale: 0.25 (25% max shift)
Character: Extreme vowel changes — jitter variations can shift F1 by ±30%, creating dramatic timbral transformations
Use on: Sound design, experimental music, extreme voice transformations
Preset 5: Reverse Mapping
🔄 Crossed + Inverted Mapping
Windows: 8
Window size: 0.2 seconds
Jitter scale: -0.15 (maps to F2, inverted)
Shimmer scale: -0.12 (maps to F1, inverted)
Character: Jitter controls F2 (front/back), shimmer controls F1 (height), both with inverse relationship
Use on: Exploring alternative sonification mappings, unusual timbral effects
Parameters & Controls
Analysis Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
| Num_windows | integer | 8 | Number of analysis windows (2-20 typical) |
| Window_size (s) | positive | 0.2 | Duration of each analysis window (seconds) |
Formant Mapping Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
| Jitter_to_F1_scale | real | 0.15 | Mapping strength: jitter → F1 (±0.3 max typical) |
| Shimmer_to_F2_scale | real | 0.12 | Mapping strength: shimmer → F2 (±0.25 max typical) |
Formant Range Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
| F1_low (Hz) | positive | 280 | Minimum F1 frequency (lowest possible first formant) |
| F1_high (Hz) | positive | 900 | Maximum F1 frequency (highest possible first formant) |
| F2_low (Hz) | positive | 900 | Minimum F2 frequency (lowest possible second formant) |
| F2_high (Hz) | positive | 2500 | Maximum F2 frequency (highest possible second formant) |
Output Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
| Draw_analysis | boolean | 1 | Generate normalized jitter/shimmer plot |
| Play_result | boolean | 1 | Audition after processing |
Parameter Interaction Guide
Visualization & Analysis
Analysis Plot
Reading the Plot
- Parallel curves: Jitter and shimmer often correlate — both increase with vocal effort or pathology
- Diverging curves: Jitter high but shimmer low may indicate specific voice disorders (rough but stable amplitude)
- Steep slopes: Rapid changes in voice quality at specific moments (onset of vocal fatigue, emotional shifts)
- Flat lines: Stable voice quality throughout
- Gaps: Unvoiced regions where default values (0.5 jitter, 0.5 shimmer) are assigned
Auditory Interpretation
- Blue curve (jitter) → F1: When jitter increases (blue above 0.5), F1 rises → vowels become lower (e.g., /i/ → /ɪ/ → /ɛ/ → /æ/). Listen for vowel "opening" or lowering.
- Red curve (shimmer) → F2: When shimmer increases (red above 0.5), F2 rises → vowels become more front (e.g., /u/ → /ʊ/ → /o/ → /ɑ/ → /æ/ → /ɛ/ → /ɪ/ → /i/). Listen for vowel "fronting".
- Combined effect: Jitter and shimmer together trace a path through vowel space — the output "maps" voice quality changes onto vowel quality changes.
Applications
Clinical Voice Assessment
Use case: Auditory display of voice quality changes for clinical feedback
Technique: Subtle preset with patient's voice sample
Workflow:
- Record patient sustaining vowel /a/ for 3-5 seconds
- Apply Subtle preset (minimal coloration)
- Listen to output while viewing analysis plot
- Hear how jitter/shimmer variations affect perceived vowel — provides auditory biofeedback
- Compare pre/post therapy recordings
Voice Science Education
Use case: Demonstrating jitter and shimmer concepts to students
Technique: Moderate preset with synthesized or recorded voice samples
Learning outcomes:
- Hear how increased jitter (roughness) maps to vowel lowering
- Hear how increased shimmer (breathiness) maps to vowel fronting
- See the analysis plot correlate with audible changes
- Understand voice perturbation as continuous parameter
Sound Design & Vocal Transformation
Use case: Creating evolving vocal textures for music/sound art
Technique: Extreme or Reverse preset with expressive vocal recordings
Applications:
- Transform spoken word into vowel-space trajectories
- Create "voice quality" controlled synthesis
- Generate evolving pads from vocal samples
- Experimental voice processing
Research Data Sonification
Use case: Converting voice perturbation measurements into audible form for pattern recognition
Technique: Custom preset with research-appropriate parameters
Advantages:
- Humans can perceive temporal patterns in audio that may be missed in visual analysis
- Multiple parameters (jitter, shimmer) mapped to perceptually distinct dimensions (F1, F2)
- Enables "auditory display" of voice quality in real-time monitoring
Practical Workflow Examples
🔬 Clinical: Sustained Vowel Analysis
Goal: Sonify jitter/shimmer variations in patient's /a/ vowel
Settings:
- Preset: Subtle (minimal coloration)
- Num_windows: 6-8 (depending on duration)
- Window_size: 0.25-0.3s (stable analysis)
- F1 range: 280-900 Hz (male) or 310-1000 Hz (female)
- F2 range: 900-2500 Hz (male) or 1000-3200 Hz (female)
Result: Output preserves original voice quality while making perturbation patterns audible as subtle vowel shifts
🎓 Educational: Demonstrating Jitter Effects
Goal: Show how jitter affects perceived vowel height
Settings:
- Preset: Moderate
- Jitter_to_F1_scale: 0.3 (exaggerated for demonstration)
- Shimmer_to_F2_scale: 0.0 (disable shimmer mapping)
- Source: Sustained /i/ vowel (high, front)
Result: As jitter varies, vowel shifts between /i/ (low jitter) and /ɛ/ (high jitter) — clearly audible height change
🎚️ Sound Design: Spoken Word Trajectory
Goal: Transform spoken phrase into evolving vowel space trajectory
Settings:
- Preset: Extreme
- Num_windows: 12-20 (high resolution)
- Window_size: 0.15s (capture rapid changes)
- F1 range: 200-1000 Hz (wide range for drama)
- F2 range: 800-3000 Hz (wide range)
Result: Spoken words transformed into abstract vowel trajectory — each phoneme's voice quality mapped to formant shifts, creating evolving texture
Troubleshooting Common Issues
Cause: Unvoiced regions with default jitter/shimmer values producing inaudible formants
Solution: Ensure source has sustained voicing, or adjust formant ranges to be more audible
Cause: Pitch analysis failed (unvoiced, noise, or wrong pitch settings)
Solution: Check that recording is clearly voiced, adjust window_size larger
Cause: Formant ranges too narrow or shifted outside audible spectrum
Solution: Reset to default ranges, ensure jitter/shimmer scales aren't extreme
Cause: Too few windows causing large jumps in formant frequencies
Solution: Increase num_windows for smoother transitions
Cause: Many windows, each requiring multiple filter operations
Solution: Reduce num_windows (6-10 typical), reduce window_size
Advanced Techniques
- Male adult: F1: 270-800 Hz, F2: 800-2500 Hz
- Female adult: F1: 310-1000 Hz, F2: 1000-3200 Hz
- Child (8-10): F1: 400-1200 Hz, F2: 1200-4000 Hz
- Soprano singer: F1: 350-1100 Hz, F2: 1100-3500 Hz
- Bass singer: F1: 250-700 Hz, F2: 700-2200 Hz
- Jitter only: Set shimmer_to_F2_scale = 0
- Shimmer only: Set jitter_to_F1_scale = 0
- Same scale for both: Equal values for balanced effect
- Negative correlation: Use negative scales for inverse mapping
- Extreme ranges: Set F1_low=200, F1_high=1200 for dramatic shifts