Voice Quality Sonification — Jitter-Shimmer Formant Mapper — User Guide

Data sonification tool: maps voice perturbation measurements (jitter and shimmer) to formant frequencies, creating auditory representations of vocal quality changes over time.

Author: Shai Cohen Version: 1.0 (2025) Technique: Voice Analysis → Formant Mapping Application: Praat scripting language
Contents:

What this does

This script implements a Jitter-Shimmer Formant Mapper — a voice quality sonification tool that analyzes perturbation measurements (jitter and shimmer) from a voice recording and uses these measurements to dynamically control formant filtering. The result is an auditory representation of vocal quality variations: as jitter and shimmer change over time, the perceived vowel quality shifts accordingly.

Key Features:

🗣️ What are Jitter and Shimmer?

Jitter (frequency perturbation) — the cycle-to-cycle variation in fundamental frequency. Measured as a percentage: 0.5-1.0% is typical for healthy voices; higher values indicate roughness, hoarseness, or pathology.

Shimmer (amplitude perturbation) — the cycle-to-cycle variation in amplitude. Typical values: 2.0-4.0% for healthy voices; higher values indicate breathiness, weakness, or pathology.

Together, jitter and shimmer are key acoustic parameters for assessing vocal quality. This tool sonifies these measurements by mapping them to formant frequencies — turning quantitative voice analysis into audible vowel changes.

Technical Implementation: (1) Window Analysis: Divide recording into N analysis windows. (2) Jitter/Shimmer Extraction: For each window, extract pitch, create point process, generate voice report, parse jitter and shimmer percentages. (3) Normalization: Scale jitter/shimmer values to 0-1 range across all windows. (4) Formant Calculation: For each window, compute shifted formant frequencies: F1 = base_F1 × (1 + (jitter_norm - 0.5) × scale × 2), F2 = base_F2 × (1 + (shimmer_norm - 0.5) × scale × 2). (5) Segment Processing: Extract each segment, apply bandpass filters at shifted formant frequencies, mix F1 and F2 bands, add to output. (6) Visualization: Plot normalized jitter (blue) and shimmer (red) curves over time.

Quick start

  1. In Praat, select exactly one Sound object (speech/voice recording, mono recommended).
  2. Run script… → select Jitter_Shimmer_Formant_Mapper.praat.
  3. Choose Preset (2-5 for specific mapping strategies, 1 for custom).
  4. Set analysis parameters: Num_windows and Window_size.
  5. Adjust mapping scales: Jitter_to_F1_scale and Shimmer_to_F2_scale.
  6. Set formant frequency ranges (F1_low/high, F2_low/high).
  7. Enable Draw_analysis for visualization of jitter/shimmer curves.
  8. Click OK — processor analyzes voice, maps to formants, creates "original_jitshim" sound object.
Quick tip: Start with Moderate Effect preset on a sustained vowel recording (2-5 seconds). Enable Draw_analysis to see how jitter (blue) and shimmer (red) vary over time. Listen to the output — you'll hear vowel quality shifting as jitter and shimmer change. For clinical voice assessment, use Subtle preset to highlight pathological variations. For sound design, try Extreme or Reverse mapping for dramatic formant shifts.
Important: VOICE REQUIREMENTS — Input must contain voiced segments with detectable pitch. Unvoiced material (whispers, noise, silence) will yield default jitter/shimmer values (0.5%, 3.0%). WINDOW SIZE — Must be large enough to contain multiple pitch periods. For low voices (80-150 Hz), use window_size ≥ 0.2s; for high voices (200-400 Hz), 0.1s may suffice. MINIMUM DURATION — Sound must be longer than 2 × window_size. FORMANT RANGES — Should be appropriate for the speaker (male: F1 270-800 Hz, F2 800-2500 Hz; female: F1 310-1000 Hz, F2 1000-3200 Hz). FILTER SMOOTHING — Fixed at 100 Hz; narrow formant ranges may require adjustment.

Voice Quality Theory

Jitter and Shimmer

Jitter (local) — percentage of cycle-to-cycle F0 variation: Jitter% = (1/(N-1)) × Σ|T_i - T_{i-1}| / (1/N) × ΣT_i × 100% Where: T_i = period of cycle i N = number of cycles Typical ranges: • 0.2-0.8% — Very stable voice (ideal) • 0.8-1.5% — Typical healthy voice • 1.5-3.0% — Slight roughness • 3.0%+ — Pathological (hoarse, rough) Shimmer (local) — percentage of cycle-to-cycle amplitude variation: Shimmer% = (1/(N-1)) × Σ|A_i - A_{i-1}| / (1/N) × ΣA_i × 100% Typical ranges: • 1.0-2.5% — Very stable voice (ideal) • 2.5-4.0% — Typical healthy voice • 4.0-7.0% — Slight breathiness • 7.0%+ — Pathological (breathy, weak)

Formant Frequencies

🎵 What are Formants?

Formants are resonant frequencies of the vocal tract that shape vowel quality. The first two formants (F1 and F2) primarily determine vowel identity:

VowelF1 (Hz) - MaleF2 (Hz) - MaleF1 (Hz) - FemaleF2 (Hz) - Female
/i/ (heed)27023003102800
/ɪ/ (hid)40020004302500
/ɛ/ (head)55018006002200
/æ/ (had)70017008002000
/ɑ/ (father)75012008501400
/ɔ/ (bought)5009005501100
/ʊ/ (hood)40010004501200
/u/ (who'd)3009003501100

F1 inversely related to tongue height: High vowels (i,u) have low F1; low vowels (æ,ɑ) have high F1.

F2 related to tongue frontness: Front vowels (i,ɪ) have high F2; back vowels (u,ɔ) have low F2.

Mapping Function

For each analysis window: 1. Normalize jitter and shimmer to 0-1 range: jitter_norm = (jitter - min_jitter) / (max_jitter - min_jitter) shimmer_norm = (shimmer - min_shimmer) / (max_shimmer - min_shimmer) 2. Calculate formant shift factors: f1_shift = 1.0 + (jitter_norm - 0.5) × jitter_scale × 2 f2_shift = 1.0 + (shimmer_norm - 0.5) × shimmer_scale × 2 The (x - 0.5) × 2 maps 0-1 range to -1 to +1. jitter_scale determines maximum deviation from unity. 3. Apply shifts to formant ranges: f1_low_shifted = f1_low × f1_shift f1_high_shifted = f1_high × f1_shift f2_low_shifted = f2_low × f2_shift f2_high_shifted = f2_high × f2_shift Example: f1_low=280, f1_high=900, jitter_scale=0.15 jitter_norm = 0.0 → shift = 1.0 + (0-0.5)×0.15×2 = 1.0 - 0.15 = 0.85 → f1 range = 238-765 Hz (lowered) jitter_norm = 1.0 → shift = 1.0 + (1-0.5)×0.15×2 = 1.0 + 0.15 = 1.15 → f1 range = 322-1035 Hz (raised)

Reverse Mapping

🔄 Cross-Mapping Concept

Reverse Mapping preset swaps the mapping relationship:

  • Jitter → F2 (instead of F1)
  • Shimmer → F1 (instead of F2)
  • Scale values negative (-0.15, -0.12) — higher jitter/shimmer lowers formants

Perceptual effect: Instead of jitter affecting vowel height and shimmer affecting front/back position, the relationship is inverted and crossed. Creates "opposite" perceptual mapping, useful for exploring alternative sonification strategies or creating unusual timbral shifts.

Negative scale: When jitter_norm > 0.5, f2_shift < 1.0 (frequencies decrease) — high jitter maps to lower, darker formants.

Preset Strategies

Preset 2: Subtle Variation

🌱 Gentle Voice Quality Sonification

Windows: 6

Window size: 0.25 seconds

Jitter scale: 0.08 (8% max shift)

Shimmer scale: 0.06 (6% max shift)

Character: Subtle formant variations, barely perceptible under normal listening

Use on: Clinical voice assessment where minimal coloration is desired, research applications

Preset 3: Moderate Effect

🎵 Balanced Voice Sonification

Windows: 8

Window size: 0.2 seconds

Jitter scale: 0.15 (15% max shift)

Shimmer scale: 0.12 (12% max shift)

Character: Clearly audible formant shifts corresponding to voice quality changes

Use on: General purpose, education, demonstration of jitter/shimmer effects

Preset 4: Extreme Mapping

⚡ Dramatic Formant Shifts

Windows: 12

Window size: 0.15 seconds

Jitter scale: 0.30 (30% max shift)

Shimmer scale: 0.25 (25% max shift)

Character: Extreme vowel changes — jitter variations can shift F1 by ±30%, creating dramatic timbral transformations

Use on: Sound design, experimental music, extreme voice transformations

Preset 5: Reverse Mapping

🔄 Crossed + Inverted Mapping

Windows: 8

Window size: 0.2 seconds

Jitter scale: -0.15 (maps to F2, inverted)

Shimmer scale: -0.12 (maps to F1, inverted)

Character: Jitter controls F2 (front/back), shimmer controls F1 (height), both with inverse relationship

Use on: Exploring alternative sonification mappings, unusual timbral effects

Parameters & Controls

Analysis Parameters

ParameterTypeDefaultDescription
Num_windowsinteger8Number of analysis windows (2-20 typical)
Window_size (s)positive0.2Duration of each analysis window (seconds)

Formant Mapping Parameters

ParameterTypeDefaultDescription
Jitter_to_F1_scalereal0.15Mapping strength: jitter → F1 (±0.3 max typical)
Shimmer_to_F2_scalereal0.12Mapping strength: shimmer → F2 (±0.25 max typical)

Formant Range Parameters

ParameterTypeDefaultDescription
F1_low (Hz)positive280Minimum F1 frequency (lowest possible first formant)
F1_high (Hz)positive900Maximum F1 frequency (highest possible first formant)
F2_low (Hz)positive900Minimum F2 frequency (lowest possible second formant)
F2_high (Hz)positive2500Maximum F2 frequency (highest possible second formant)

Output Parameters

ParameterTypeDefaultDescription
Draw_analysisboolean1Generate normalized jitter/shimmer plot
Play_resultboolean1Audition after processing

Parameter Interaction Guide

Jitter_norm range affects F1 as: F1_center = (F1_low + F1_high) / 2 F1_span = (F1_high - F1_low) / 2 Actual F1_range = [F1_center - F1_span × (1 + jitter_shift), F1_center + F1_span × (1 + jitter_shift)] Where jitter_shift = (jitter_norm - 0.5) × jitter_scale × 2 Example: F1_low=280, F1_high=900 → F1_center=590, F1_span=310 jitter_scale=0.15, jitter_norm=0.0 → jitter_shift = -0.15 → F1_range = 590 - 310×0.85 to 590 + 310×0.85 = 590 - 263.5 to 590 + 263.5 = 326.5 to 853.5 Hz Note: The bandpass filter uses the shifted range directly

Visualization & Analysis

Analysis Plot

Jitter-Shimmer Analysis Plot: X-axis: Time (seconds) Y-axis: Normalized value (0.0 to 1.2) Elements: • Blue line with circles: Normalized jitter values • Red line with circles: Normalized shimmer values • Gray horizontal line: 0.5 reference (mean of range) • Grid at 0.5 intervals • Legend: "Jitter -> F1" (Blue), "Shimmer -> F2" (Red) Interpretation: • Values above 0.5 = higher than average jitter/shimmer • Values below 0.5 = lower than average jitter/shimmer • Spacing of circles = analysis window positions • Slope between windows = rate of change in voice quality

Reading the Plot

What to look for:
  • Parallel curves: Jitter and shimmer often correlate — both increase with vocal effort or pathology
  • Diverging curves: Jitter high but shimmer low may indicate specific voice disorders (rough but stable amplitude)
  • Steep slopes: Rapid changes in voice quality at specific moments (onset of vocal fatigue, emotional shifts)
  • Flat lines: Stable voice quality throughout
  • Gaps: Unvoiced regions where default values (0.5 jitter, 0.5 shimmer) are assigned

Auditory Interpretation

What to listen for in the output:
  • Blue curve (jitter) → F1: When jitter increases (blue above 0.5), F1 rises → vowels become lower (e.g., /i/ → /ɪ/ → /ɛ/ → /æ/). Listen for vowel "opening" or lowering.
  • Red curve (shimmer) → F2: When shimmer increases (red above 0.5), F2 rises → vowels become more front (e.g., /u/ → /ʊ/ → /o/ → /ɑ/ → /æ/ → /ɛ/ → /ɪ/ → /i/). Listen for vowel "fronting".
  • Combined effect: Jitter and shimmer together trace a path through vowel space — the output "maps" voice quality changes onto vowel quality changes.

Applications

Clinical Voice Assessment

Use case: Auditory display of voice quality changes for clinical feedback

Technique: Subtle preset with patient's voice sample

Workflow:

Voice Science Education

Use case: Demonstrating jitter and shimmer concepts to students

Technique: Moderate preset with synthesized or recorded voice samples

Learning outcomes:

Sound Design & Vocal Transformation

Use case: Creating evolving vocal textures for music/sound art

Technique: Extreme or Reverse preset with expressive vocal recordings

Applications:

Research Data Sonification

Use case: Converting voice perturbation measurements into audible form for pattern recognition

Technique: Custom preset with research-appropriate parameters

Advantages:

Practical Workflow Examples

🔬 Clinical: Sustained Vowel Analysis

Goal: Sonify jitter/shimmer variations in patient's /a/ vowel

Settings:

  • Preset: Subtle (minimal coloration)
  • Num_windows: 6-8 (depending on duration)
  • Window_size: 0.25-0.3s (stable analysis)
  • F1 range: 280-900 Hz (male) or 310-1000 Hz (female)
  • F2 range: 900-2500 Hz (male) or 1000-3200 Hz (female)

Result: Output preserves original voice quality while making perturbation patterns audible as subtle vowel shifts

🎓 Educational: Demonstrating Jitter Effects

Goal: Show how jitter affects perceived vowel height

Settings:

  • Preset: Moderate
  • Jitter_to_F1_scale: 0.3 (exaggerated for demonstration)
  • Shimmer_to_F2_scale: 0.0 (disable shimmer mapping)
  • Source: Sustained /i/ vowel (high, front)

Result: As jitter varies, vowel shifts between /i/ (low jitter) and /ɛ/ (high jitter) — clearly audible height change

🎚️ Sound Design: Spoken Word Trajectory

Goal: Transform spoken phrase into evolving vowel space trajectory

Settings:

  • Preset: Extreme
  • Num_windows: 12-20 (high resolution)
  • Window_size: 0.15s (capture rapid changes)
  • F1 range: 200-1000 Hz (wide range for drama)
  • F2 range: 800-3000 Hz (wide range)

Result: Spoken words transformed into abstract vowel trajectory — each phoneme's voice quality mapped to formant shifts, creating evolving texture

Troubleshooting Common Issues

Problem: No output or silent sections
Cause: Unvoiced regions with default jitter/shimmer values producing inaudible formants
Solution: Ensure source has sustained voicing, or adjust formant ranges to be more audible
Problem: Jitter/shimmer values all identical
Cause: Pitch analysis failed (unvoiced, noise, or wrong pitch settings)
Solution: Check that recording is clearly voiced, adjust window_size larger
Problem: Output sounds like noise, not vowel
Cause: Formant ranges too narrow or shifted outside audible spectrum
Solution: Reset to default ranges, ensure jitter/shimmer scales aren't extreme
Problem: Abrupt transitions between segments
Cause: Too few windows causing large jumps in formant frequencies
Solution: Increase num_windows for smoother transitions
Problem: Processing very slow
Cause: Many windows, each requiring multiple filter operations
Solution: Reduce num_windows (6-10 typical), reduce window_size

Advanced Techniques

Custom formant ranges for specific speakers:
  • Male adult: F1: 270-800 Hz, F2: 800-2500 Hz
  • Female adult: F1: 310-1000 Hz, F2: 1000-3200 Hz
  • Child (8-10): F1: 400-1200 Hz, F2: 1200-4000 Hz
  • Soprano singer: F1: 350-1100 Hz, F2: 1100-3500 Hz
  • Bass singer: F1: 250-700 Hz, F2: 700-2200 Hz
Alternative mapping strategies (edit script):
  • Jitter only: Set shimmer_to_F2_scale = 0
  • Shimmer only: Set jitter_to_F1_scale = 0
  • Same scale for both: Equal values for balanced effect
  • Negative correlation: Use negative scales for inverse mapping
  • Extreme ranges: Set F1_low=200, F1_high=1200 for dramatic shifts