Pitch & Loudness Comparison Tool — User Guide

Dual-track analysis: compares pitch and loudness contours between two audio files frame-by-frame, designed for teacher-student comparison in vocal training and language learning.

Author: Shai Cohen Version: 2.0 (Fixed) License: MIT License Category: Educational & Comparative Analysis
Contents:

What this does

This script implements frame-by-frame audio comparison — a specialized tool for comparing two audio recordings along pitch and loudness dimensions. The tool is designed for educational contexts where a "teacher" recording (model) is compared with a "student" recording (attempt). It performs: (1) Temporal Alignment: Aligns the two recordings by time frame based on analysis step size. (2) Loudness Analysis: Computes intensity contours, compares decibel levels frame-by-frame, calculates statistical differences. (3) Pitch Analysis: Extracts fundamental frequency contours, compares in semitone units, assesses pitch matching accuracy. (4) Comprehensive Metrics: Generates multiple difference measures including average difference, RMS difference, maximum difference, dynamic range comparison, and consistency measures.

Key Features:

Why frame-by-frame comparison? Traditional audio comparison: Overall statistics (mean, range). Frame-by-frame comparison: Temporal evolution matching. Advantages: (1) Temporal precision: Identifies where differences occur in time. (2) Pattern recognition: Detects matching of contours, not just averages. (3) Educational value: Shows students exactly where they deviate from model. (4) Detailed feedback: Multiple metrics provide comprehensive assessment. (5) Alignment insensitive: Works even if recordings aren't perfectly time-aligned (compares aligned frames). Use cases: Language learning (pronunciation coaching), singing training (pitch matching), speech therapy (prosody training), audio forensics (recording comparison), voice analysis (style imitation).

Technical Implementation: The script processes two sounds in parallel: (1) Intensity Extraction: Converts each sound to intensity contour with 75 Hz pitch floor, user-defined time step. (2) Pitch Extraction: Converts each sound to pitch contour with user-defined floor/ceiling. (3) Frame Alignment: Uses minimum number of frames between the two analyses. (4) Difference Calculation: For each aligned frame, computes absolute difference (dB for loudness, semitones for pitch). (5) Statistical Aggregation: Computes average, RMS, min, max differences across all frames. (6) Additional Metrics: Dynamic range difference (max-min), consistency difference (std deviation). Key insight: The comparison is symmetric but assumes Sound1 as reference/teacher, Sound2 as student/attempt.

Quick start

  1. In Praat, select EXACTLY TWO Sound objects.
  2. Important: First selected = Teacher/Reference, Second selected = Student/Attempt.
  3. Run script…compare_pitch_and_loudness_FIXED_STDDEV_ARGS.praat.
  4. Choose analysis_type: Loudness_only, Pitch_only, or Both_pitch_and_loudness.
  5. Set time_step (default 0.01s = 100 Hz frame rate).
  6. For pitch analysis: Set pitch_floor_hz and pitch_ceiling_hz.
  7. Click OK — results appear in Info window.
  8. Review comparison metrics for feedback.
Quick tip: For language learning, use Both_pitch_and_loudness to analyze prosody. Set time_step = 0.01 for detailed analysis (100 frames/sec). For singing practice, adjust pitch_floor_hz and ceiling to match vocal range (e.g., 80-400 Hz for typical singing). Ensure recordings are similar duration — longer ones will be truncated to shorter duration for comparison. The script aligns by frame number, not by absolute time — slight timing differences are okay. For best results, normalize both recordings to similar loudness before comparison. Watch for "Voiced frames analyzed" count in pitch results — low counts indicate poor pitch detection (may need adjustment). Results show both absolute differences and relative patterns.
Important: EXACTLY TWO SOUNDS REQUIRED — Script will fail with 0, 1, or 3+ sounds. Order matters: First selected = Teacher, Second selected = Student — reverse order reverses interpretation. Different durations handled: Comparison uses minimum frames — longer recording truncated to match shorter. Pitch analysis requires voiced sounds — Unvoiced frames (fricatives, silence) are skipped in pitch comparison. Intensity in dB — Values are relative to Praat's reference (1e-12 W/m²), not perceptual loudness. Semitone calculation uses logarithmic formula: 12 × ln(f2/f1)/ln(2). Time step trade-off: Smaller = more frames = more detailed but slower. Pitch floor/ceiling must bracket actual pitch — incorrect settings cause pitch detection failures.

Loudness Comparison

🔊 Loudness Analysis Pipeline

Step 1: Convert each sound to intensity contour (75 Hz pitch floor)

Step 2: Extract frame-by-frame dB values

Step 3: Align frames (use minimum number)

Step 4: Compute differences for each aligned frame

Step 5: Calculate aggregate statistics

Step 6: Compute additional loudness metrics

Intensity Extraction Parameters

ParameterDefaultEffectRecommended Range
time_step0.01 sFrame rate (100 Hz)0.005-0.02 s
Pitch floor (implied)75 HzMinimum frequency in intensity50-100 Hz
Subtract meanyesRemoves DC componentAlways yes

Loudness Difference Metrics

FRAME-BY-FRAME DIFFERENCES: For each frame i (1 to N): db1[i] = intensity of teacher at frame i db2[i] = intensity of student at frame i if both defined: diff[i] = |db1[i] - db2[i]| AGGREGATE STATISTICS: 1. AVERAGE DIFFERENCE: avg_diff = (sum of diff[i]) / N 2. RMS DIFFERENCE: rms_diff = sqrt( (sum of diff[i]²) / N ) 3. MAXIMUM DIFFERENCE: max_diff = maximum(diff[i]) 4. MINIMUM DIFFERENCE: min_diff = minimum(diff[i]) ADDITIONAL METRICS: 5. DYNAMIC RANGE DIFFERENCE: dr1 = max(db1) - min(db1) dr2 = max(db2) - min(db2) dr_diff = |dr1 - dr2| 6. CONSISTENCY DIFFERENCE: std1 = standard_deviation(db1) std2 = standard_deviation(db2) consistency_diff = |std1 - std2|

Interpreting Loudness Results

MetricExcellentGoodNeeds WorkInterpretation
Average dB difference< 2 dB2-5 dB> 5 dBOverall loudness matching
RMS dB difference< 3 dB3-6 dB> 6 dBConsistent matching
Max dB difference< 10 dB10-20 dB> 20 dBWorst-case deviation
Dynamic range diff< 3 dB3-8 dB> 8 dBExpression matching
Consistency diff< 2 dB2-4 dB> 4 dBSteadiness matching

Educational Guidelines for Loudness

🎤 Loudness Training Scenarios

Scenario 1: Volume Matching

  • Goal: Match overall loudness level
  • Target: Average difference < 3 dB
  • Focus: Average and RMS differences

Scenario 2: Dynamic Expression

  • Goal: Match loudness contours/expression
  • Target: Dynamic range difference < 5 dB
  • Focus: Dynamic range and consistency differences

Scenario 3: Loudness Control

  • Goal: Maintain steady volume
  • Target: Consistency difference < 2 dB
  • Focus: Standard deviation comparison

Common Loudness Issues and Solutions

ISSUE: Large average difference (> 5 dB) CAUSE: Overall volume mismatch SOLUTION: Normalize student recording to match teacher's RMS level ISSUE: Large dynamic range difference (> 8 dB) CAUSE: Different expressive patterns SOLUTION: Practice crescendo/decrescendo matching ISSUE: Large max difference (> 20 dB) CAUSE: Sudden loud spikes or drops SOLUTION: Identify and smooth extreme variations ISSUE: Inconsistent matching (high RMS > low average) CAUSE: Erratic volume changes SOLUTION: Work on steady breath control

Pitch Comparison

🎵 Pitch Analysis Pipeline

Step 1: Convert each sound to pitch contour (user-defined range)

Step 2: Extract frame-by-frame F0 values (Hz)

Step 3: Align frames (use minimum number)

Step 4: Compute semitone differences for voiced frames only

Step 5: Calculate aggregate pitch statistics

Pitch Extraction Parameters

ParameterDefaultEffectRecommended Range
time_step0.01 sFrame rate (100 Hz)0.005-0.02 s
pitch_floor_hz75 HzMinimum F0 to searchMale: 75-100, Female: 150-200
pitch_ceiling_hz600 HzMaximum F0 to searchMale: 300-400, Female: 500-600

Pitch Difference Calculation

SEMITONE CALCULATION: For two frequencies f1 and f2 (in Hz): semitone_difference = |12 × ln(f2 / f1) / ln(2)| Where: • ln = natural logarithm • 12 semitones per octave • ln(2) ≈ 0.693147 Examples: f1 = 220 Hz (A3), f2 = 440 Hz (A4) → 12.00 semitones (octave) f1 = 261.6 Hz (C4), f2 = 293.7 Hz (D4) → 2.00 semitones (whole step) f1 = 100 Hz, f2 = 105 Hz → 0.84 semitones (≈ quarter tone) FRAME PROCESSING: For each frame i: f1 = teacher pitch at frame i (Hz) f2 = student pitch at frame i (Hz) If f1 > 0 AND f2 > 0 (both voiced): semitone_diff[i] = |12 × ln(f2/f1)/ln(2)| Include in statistics Else (unvoiced in either): Skip frame AGGREGATE STATISTICS: • Average semitone difference = mean(semitone_diff[i]) • Maximum semitone difference = max(semitone_diff[i]) • Voiced frames count = number of frames where both voiced

Interpreting Pitch Results

MetricExcellentGoodNeeds WorkInterpretation
Average semitone difference< 0.5 st0.5-1.5 st> 1.5 stOverall pitch accuracy
Max semitone difference< 3 st3-6 st> 6 stWorst-case pitch error
Voiced frames analyzed> 80%50-80%< 50%Pitch detection reliability

Pitch Detection and Voicing

VOICED FRAME DETECTION: A frame is considered "voiced" if: pitch > 0 Hz (Praat's pitch detector found F0) Typical voiced percentages: • Vowels: 95-100% voiced • Nasals: 80-95% voiced • Voiced fricatives: 50-80% voiced • Unvoiced sounds: 0-20% voiced • Silence: 0% voiced IMPORTANT NOTES: • Only voiced frames included in pitch statistics • Unvoiced frames (fricatives, stops, silence) skipped • Low voiced percentage may indicate: - Incorrect pitch floor/ceiling - Very breathy/whispered voice - Background noise - Technical issues with recording

Educational Guidelines for Pitch

🎶 Pitch Training Scenarios

Scenario 1: Absolute Pitch Matching

  • Goal: Match exact pitch frequencies
  • Target: Average difference < 0.5 semitones
  • Exercises: Sustained note matching, pitch glide following

Scenario 2: Relative Pitch Contours

  • Goal: Match pitch patterns/melodies
  • Target: Average difference < 1.5 semitones with good contour
  • Exercises: Sentence intonation, musical phrase imitation

Scenario 3: Pitch Stability

  • Goal: Maintain steady pitch
  • Target: Max difference < 3 semitones during sustained notes
  • Exercises: Long tone practice, vibrato control

Common Pitch Issues and Solutions

ISSUE: High average semitone difference (> 2 st) CAUSE: Systematic pitch offset (singing in wrong key) SOLUTION: Practice with reference tones, use pitch pipe ISSUE: Large max difference (> 6 st) but low average CAUSE: Occasional large errors (cracking, slips) SOLUTION: Identify problematic notes, practice transitions ISSUE: Low voiced frames percentage (< 50%) CAUSE: Breathy voice, incorrect pitch range settings SOLUTION: Adjust pitch floor/ceiling, improve vocal fold closure ISSUE: Inconsistent matching (high max, medium average) CAUSE: Erratic pitch control SOLUTION: Slower practice, focus on steady tones first

Result Interpretation

📊 Comprehensive Assessment Framework

Holistic View: Consider all metrics together, not individually

Context Matters: Different goals require different metric priorities

Progress Tracking: Compare metrics across practice sessions

Threshold Guidelines: Use suggested ranges as starting points

Combined Pitch and Loudness Assessment

Performance LevelPitch (Avg st diff)Loudness (Avg dB diff)Typical Profile
Expert< 0.5< 2Near-perfect matching
Advanced0.5-1.02-3Good matching, minor variations
Intermediate1.0-2.03-5Recognizable pattern, noticeable differences
Beginner2.0-4.05-8Basic contour followed, significant errors
Novice> 4.0> 8Little pattern matching

Example Output Interpretation

SAMPLE OUTPUT (Both analysis): === LOUDNESS COMPARISON === Frames compared: 452 Average dB difference: 4.237 dB RMS dB difference: 5.812 dB Max dB difference: 18.943 dB Min dB difference: 0.012 dB Dynamic range difference: 7.456 dB Consistency difference (std): 3.128 dB === PITCH COMPARISON === Voiced frames analyzed: 387 out of 452 Average semitone difference: 1.843 Max semitone difference: 8.237 INTERPRETATION: 1. Loudness: Moderate matching (4.2 dB average diff) - Dynamic expression differs (7.5 dB range diff) - Some consistency issues (3.1 dB std diff) 2. Pitch: Intermediate level (1.8 st average diff) - Some large errors (8.2 st max diff) - Good voiced detection (387/452 = 86%) 3. Overall: Student follows general patterns - Needs work on pitch accuracy and loudness control - Focus on reducing maximum errors

Progress Tracking Over Time

TRACKING TEMPLATE: Session 1: Pitch avg: 3.2 st | Loudness avg: 6.5 dB Issues: Large pitch errors, volume mismatch Session 2: Pitch avg: 2.1 st | Loudness avg: 4.8 dB Progress: Both improved, still significant differences Session 3: Pitch avg: 1.4 st | Loudness avg: 3.2 dB Progress: Good improvement, approaching target ranges Session 4: Pitch avg: 0.9 st | Loudness avg: 2.4 dB Goal achieved: Good matching, minor variations KEY INSIGHTS: • Track relative improvement, not just absolute values • Different metrics may improve at different rates • Plateaus are normal in skill development • Some sessions may show regression (normal in learning)

Limitations and Considerations

Not a complete assessment: Only measures pitch and loudness — doesn't assess timbre, articulation, vowel quality, or timing precision. Frame alignment assumption: Assumes similar timing — large tempo differences will cause misalignment. Pitch detection limitations: Praat's pitch detector can fail on breathy, creaky, or noisy voices. Loudness vs perceived loudness: dB measurements don't account for equal-loudness contours (Fletcher-Munson). No normalization: Doesn't automatically adjust for different recording levels — pre-normalization recommended. Binary voiced/unvoiced: Simplified model — real voice has degrees of voicing. Semitone calculation assumes equal temperament: May not match perception for microtonal music.

Educational Applications

Language Learning and Pronunciation

🗣️ Pronunciation and Intonation Training

Goal: Improve pronunciation accuracy through prosody matching

Workflow:

  1. Teacher records model sentence with target intonation
  2. Student records imitation attempt
  3. Run Both_pitch_and_loudness analysis
  4. Review metrics together
  5. Identify specific problem areas
  6. Practice with focused exercises

Target metrics: Pitch average < 1.5 st, Loudness average < 4 dB

Singing and Vocal Training

🎤 Vocal Pitch Accuracy Training

Goal: Improve pitch accuracy for singers

Workflow:

  1. Select Pitch_only analysis
  2. Teacher sings target phrase
  3. Student imitates
  4. Analyze pitch matching
  5. Focus on reducing average and maximum differences
  6. Track progress over practice sessions

Target metrics: Professional: < 0.3 st, Student: < 1.0 st

Speech Therapy and Voice Rehabilitation

🏥 Voice Therapy Applications

Goal: Monitor voice parameter changes in therapy

Workflow:

  1. Record baseline (healthy model or patient's best effort)
  2. Record current performance
  3. Compare to track changes
  4. Use metrics as objective measures of progress
  5. Adjust therapy based on results

Applications: Parkinson's voice therapy, vocal fold paralysis, pitch therapy for transgender voice

Accent Reduction and Dialect Training

🌍 Prosody Pattern Acquisition

Goal: Acquire target language/dialect prosody patterns

Workflow:

  1. Native speaker records target prosody patterns
  2. Learner records imitation
  3. Compare using Both analysis
  4. Identify which aspects (pitch vs loudness) need most work
  5. Practice specific problematic patterns

Example: English question intonation, Mandarin tones, Japanese pitch accent

Advanced Educational Techniques

Segmented Practice Approach:
  1. Isolate: Practice single words or short phrases
  2. Analyze: Get immediate feedback on each attempt
  3. Correct: Adjust based on specific metrics
  4. Integrate: Combine into longer phrases
  5. Automate: Practice until metrics reach target ranges

Example progression: Single vowel → Word → Phrase → Sentence → Paragraph

Gamification Strategies:
  • Score system: Convert metrics to points (lower differences = higher scores)
  • Level progression: Unlock harder material as metrics improve
  • Challenge modes: Try to beat previous best scores
  • Multiplayer: Compare scores among students (healthy competition)
  • Achievements: Unlock badges for reaching metric milestones

Troubleshooting Common Issues

Problem: Very low voiced frame count in pitch analysis
Causes: Incorrect pitch range, breathy voice, background noise, recording issues
Solutions: Adjust pitch_floor_hz and pitch_ceiling_hz, improve recording quality, use pop filter
Problem: Extremely high dB differences (> 30 dB)
Causes: Different recording levels, clipping, normalization needed
Solutions: Normalize both recordings to similar RMS levels before comparison
Problem: Inconsistent results across attempts
Causes: Variable performance, different recording conditions, analysis parameter changes
Solutions:
Problem: Analysis takes very long
Causes: Long recordings, very small time_step
Solutions: Use shorter segments for practice, increase time_step (e.g., 0.02s), focus on key phrases

Integration with Other Tools

ToolIntegrationEnhanced Capability
Praat TextGridSegment recordings by phoneme/wordCompare specific speech segments
Spreadsheet softwareExport metrics for trackingLong-term progress visualization
Audio editorPre-process recordingsNormalize, trim, filter before analysis
Recording appStandardize recording setupConsistent input quality
Learning management systemEmbed in online coursesDistance learning applications