Pitch & Loudness Comparison Tool — User Guide
Dual-track analysis: compares pitch and loudness contours between two audio files frame-by-frame, designed for teacher-student comparison in vocal training and language learning.
What this does
This script implements frame-by-frame audio comparison — a specialized tool for comparing two audio recordings along pitch and loudness dimensions. The tool is designed for educational contexts where a "teacher" recording (model) is compared with a "student" recording (attempt). It performs: (1) Temporal Alignment: Aligns the two recordings by time frame based on analysis step size. (2) Loudness Analysis: Computes intensity contours, compares decibel levels frame-by-frame, calculates statistical differences. (3) Pitch Analysis: Extracts fundamental frequency contours, compares in semitone units, assesses pitch matching accuracy. (4) Comprehensive Metrics: Generates multiple difference measures including average difference, RMS difference, maximum difference, dynamic range comparison, and consistency measures.
Key Features:
- Three Analysis Modes — Loudness only, Pitch only, or Both
- Frame-by-Frame Comparison — Precise temporal alignment and comparison
- Educational Focus — Designed for teacher-student vocal training
- Multiple Difference Metrics — Average, RMS, max, min differences
- Dynamic Range Analysis — Compares overall loudness variation
- Consistency Measurement — Standard deviation comparison
- Intelligent Handling — Skips unvoiced frames in pitch analysis
- Clear Output — Formatted results in Praat Info window
Technical Implementation: The script processes two sounds in parallel: (1) Intensity Extraction: Converts each sound to intensity contour with 75 Hz pitch floor, user-defined time step. (2) Pitch Extraction: Converts each sound to pitch contour with user-defined floor/ceiling. (3) Frame Alignment: Uses minimum number of frames between the two analyses. (4) Difference Calculation: For each aligned frame, computes absolute difference (dB for loudness, semitones for pitch). (5) Statistical Aggregation: Computes average, RMS, min, max differences across all frames. (6) Additional Metrics: Dynamic range difference (max-min), consistency difference (std deviation). Key insight: The comparison is symmetric but assumes Sound1 as reference/teacher, Sound2 as student/attempt.
Quick start
- In Praat, select EXACTLY TWO Sound objects.
- Important: First selected = Teacher/Reference, Second selected = Student/Attempt.
- Run script… →
compare_pitch_and_loudness_FIXED_STDDEV_ARGS.praat. - Choose analysis_type: Loudness_only, Pitch_only, or Both_pitch_and_loudness.
- Set time_step (default 0.01s = 100 Hz frame rate).
- For pitch analysis: Set pitch_floor_hz and pitch_ceiling_hz.
- Click OK — results appear in Info window.
- Review comparison metrics for feedback.
Loudness Comparison
🔊 Loudness Analysis Pipeline
Step 1: Convert each sound to intensity contour (75 Hz pitch floor)
Step 2: Extract frame-by-frame dB values
Step 3: Align frames (use minimum number)
Step 4: Compute differences for each aligned frame
Step 5: Calculate aggregate statistics
Step 6: Compute additional loudness metrics
Intensity Extraction Parameters
| Parameter | Default | Effect | Recommended Range |
|---|---|---|---|
| time_step | 0.01 s | Frame rate (100 Hz) | 0.005-0.02 s |
| Pitch floor (implied) | 75 Hz | Minimum frequency in intensity | 50-100 Hz |
| Subtract mean | yes | Removes DC component | Always yes |
Loudness Difference Metrics
Interpreting Loudness Results
| Metric | Excellent | Good | Needs Work | Interpretation |
|---|---|---|---|---|
| Average dB difference | < 2 dB | 2-5 dB | > 5 dB | Overall loudness matching |
| RMS dB difference | < 3 dB | 3-6 dB | > 6 dB | Consistent matching |
| Max dB difference | < 10 dB | 10-20 dB | > 20 dB | Worst-case deviation |
| Dynamic range diff | < 3 dB | 3-8 dB | > 8 dB | Expression matching |
| Consistency diff | < 2 dB | 2-4 dB | > 4 dB | Steadiness matching |
Educational Guidelines for Loudness
🎤 Loudness Training Scenarios
Scenario 1: Volume Matching
- Goal: Match overall loudness level
- Target: Average difference < 3 dB
- Focus: Average and RMS differences
Scenario 2: Dynamic Expression
- Goal: Match loudness contours/expression
- Target: Dynamic range difference < 5 dB
- Focus: Dynamic range and consistency differences
Scenario 3: Loudness Control
- Goal: Maintain steady volume
- Target: Consistency difference < 2 dB
- Focus: Standard deviation comparison
Common Loudness Issues and Solutions
Pitch Comparison
🎵 Pitch Analysis Pipeline
Step 1: Convert each sound to pitch contour (user-defined range)
Step 2: Extract frame-by-frame F0 values (Hz)
Step 3: Align frames (use minimum number)
Step 4: Compute semitone differences for voiced frames only
Step 5: Calculate aggregate pitch statistics
Pitch Extraction Parameters
| Parameter | Default | Effect | Recommended Range |
|---|---|---|---|
| time_step | 0.01 s | Frame rate (100 Hz) | 0.005-0.02 s |
| pitch_floor_hz | 75 Hz | Minimum F0 to search | Male: 75-100, Female: 150-200 |
| pitch_ceiling_hz | 600 Hz | Maximum F0 to search | Male: 300-400, Female: 500-600 |
Pitch Difference Calculation
Interpreting Pitch Results
| Metric | Excellent | Good | Needs Work | Interpretation |
|---|---|---|---|---|
| Average semitone difference | < 0.5 st | 0.5-1.5 st | > 1.5 st | Overall pitch accuracy |
| Max semitone difference | < 3 st | 3-6 st | > 6 st | Worst-case pitch error |
| Voiced frames analyzed | > 80% | 50-80% | < 50% | Pitch detection reliability |
Pitch Detection and Voicing
Educational Guidelines for Pitch
🎶 Pitch Training Scenarios
Scenario 1: Absolute Pitch Matching
- Goal: Match exact pitch frequencies
- Target: Average difference < 0.5 semitones
- Exercises: Sustained note matching, pitch glide following
Scenario 2: Relative Pitch Contours
- Goal: Match pitch patterns/melodies
- Target: Average difference < 1.5 semitones with good contour
- Exercises: Sentence intonation, musical phrase imitation
Scenario 3: Pitch Stability
- Goal: Maintain steady pitch
- Target: Max difference < 3 semitones during sustained notes
- Exercises: Long tone practice, vibrato control
Common Pitch Issues and Solutions
Result Interpretation
📊 Comprehensive Assessment Framework
Holistic View: Consider all metrics together, not individually
Context Matters: Different goals require different metric priorities
Progress Tracking: Compare metrics across practice sessions
Threshold Guidelines: Use suggested ranges as starting points
Combined Pitch and Loudness Assessment
| Performance Level | Pitch (Avg st diff) | Loudness (Avg dB diff) | Typical Profile |
|---|---|---|---|
| Expert | < 0.5 | < 2 | Near-perfect matching |
| Advanced | 0.5-1.0 | 2-3 | Good matching, minor variations |
| Intermediate | 1.0-2.0 | 3-5 | Recognizable pattern, noticeable differences |
| Beginner | 2.0-4.0 | 5-8 | Basic contour followed, significant errors |
| Novice | > 4.0 | > 8 | Little pattern matching |
Example Output Interpretation
Progress Tracking Over Time
Limitations and Considerations
Educational Applications
Language Learning and Pronunciation
🗣️ Pronunciation and Intonation Training
Goal: Improve pronunciation accuracy through prosody matching
Workflow:
- Teacher records model sentence with target intonation
- Student records imitation attempt
- Run Both_pitch_and_loudness analysis
- Review metrics together
- Identify specific problem areas
- Practice with focused exercises
Target metrics: Pitch average < 1.5 st, Loudness average < 4 dB
Singing and Vocal Training
🎤 Vocal Pitch Accuracy Training
Goal: Improve pitch accuracy for singers
Workflow:
- Select Pitch_only analysis
- Teacher sings target phrase
- Student imitates
- Analyze pitch matching
- Focus on reducing average and maximum differences
- Track progress over practice sessions
Target metrics: Professional: < 0.3 st, Student: < 1.0 st
Speech Therapy and Voice Rehabilitation
🏥 Voice Therapy Applications
Goal: Monitor voice parameter changes in therapy
Workflow:
- Record baseline (healthy model or patient's best effort)
- Record current performance
- Compare to track changes
- Use metrics as objective measures of progress
- Adjust therapy based on results
Applications: Parkinson's voice therapy, vocal fold paralysis, pitch therapy for transgender voice
Accent Reduction and Dialect Training
🌍 Prosody Pattern Acquisition
Goal: Acquire target language/dialect prosody patterns
Workflow:
- Native speaker records target prosody patterns
- Learner records imitation
- Compare using Both analysis
- Identify which aspects (pitch vs loudness) need most work
- Practice specific problematic patterns
Example: English question intonation, Mandarin tones, Japanese pitch accent
Advanced Educational Techniques
- Isolate: Practice single words or short phrases
- Analyze: Get immediate feedback on each attempt
- Correct: Adjust based on specific metrics
- Integrate: Combine into longer phrases
- Automate: Practice until metrics reach target ranges
Example progression: Single vowel → Word → Phrase → Sentence → Paragraph
- Score system: Convert metrics to points (lower differences = higher scores)
- Level progression: Unlock harder material as metrics improve
- Challenge modes: Try to beat previous best scores
- Multiplayer: Compare scores among students (healthy competition)
- Achievements: Unlock badges for reaching metric milestones
Troubleshooting Common Issues
Causes: Incorrect pitch range, breathy voice, background noise, recording issues
Solutions: Adjust pitch_floor_hz and pitch_ceiling_hz, improve recording quality, use pop filter
Causes: Different recording levels, clipping, normalization needed
Solutions: Normalize both recordings to similar RMS levels before comparison
Causes: Variable performance, different recording conditions, analysis parameter changes
Solutions:
Causes: Long recordings, very small time_step
Solutions: Use shorter segments for practice, increase time_step (e.g., 0.02s), focus on key phrases
Integration with Other Tools
| Tool | Integration | Enhanced Capability |
|---|---|---|
| Praat TextGrid | Segment recordings by phoneme/word | Compare specific speech segments |
| Spreadsheet software | Export metrics for tracking | Long-term progress visualization |
| Audio editor | Pre-process recordings | Normalize, trim, filter before analysis |
| Recording app | Standardize recording setup | Consistent input quality |
| Learning management system | Embed in online courses | Distance learning applications |