OT Corpus Concatenator — User Guide
Optimality Theory-based audio corpus concatenation: Uses constraint-based ranking to select and concatenate optimal audio files based on spectral and energetic properties.
What this does
This script implements Optimality Theory-based audio concatenation — a computational linguistics-inspired approach to automatically select and concatenate audio files from a corpus based on weighted constraint satisfaction. The system analyzes each audio file for four key acoustic properties, computes constraint violations, and ranks files by their Harmony Score (lower = better). The top N files are then concatenated into a single optimized audio sequence.
Key Features:
- OT Constraint System — Four weighted constraints: DARKNESS, BRIGHTNESS, LOW-ENERGY, UNSTABLE
- MFCC-Based Analysis — Uses MFCC C0 (energy) and C1 (spectral tilt) for acoustic characterization
- Dynamic Ranking — Real-time constraint violation calculation and sorting
- Flexible Selection — Configurable number of output files (limit_files parameter)
- Complete Workflow — From directory selection to final concatenation in one script
- Detailed Reporting — Comprehensive info window output with violation breakdowns
Technical Implementation: (1) Directory scanning: Loads all .wav files from user-selected folder. (2) MFCC analysis: Extracts C0 (energy) and C1 (spectral tilt) coefficients. (3) Constraint violation calculation: Computes four violation scores per file. (4) Harmony scoring: Weighted sum: Σ(violation × weight). (5) Ranking: Sort ascending (lower harmony = better). (6) Concatenation: Join top N files sequentially. (7) Reporting: Detailed output in Praat Info window. Key insight: Weight adjustments prioritize different acoustic properties — e.g., high weight_energy prioritizes loud files, high weight_stability prioritizes spectrally consistent files.
Quick start
- Prepare a folder containing .wav audio files for concatenation.
- In Praat, open the script:
OT_Corpus_Concatenator.praat. - Adjust limit_files (number of files to concatenate, max 10).
- Set constraint weights according to your acoustic preferences:
- weight_darkness — Penalizes negative spectral tilt (dark sounds)
- weight_brightness — Penalizes positive spectral tilt (bright sounds)
- weight_energy — Penalizes low energy (quiet sounds)
- weight_stability — Penalizes spectral instability
- Enable play_result to auto-play concatenated output.
- Click Run — select folder when prompted.
- Script analyzes all .wav files, ranks by Harmony Score, concatenates top N.
- Output named "OT_Optimal_Concatenation" appears in Objects window.
weight_energy = 2.0, others = 1.0 or 0.0. This prioritizes loud, energetic files. For smooth, consistent concatenation: set weight_stability = 2.0, weight_energy = 1.0. For bright, articulate results: set weight_brightness = 0.0, weight_darkness = 2.0. Check Praat Info window for detailed ranking and violation breakdowns. Output is raw concatenation — consider adding crossfades manually if needed.
OT Theory & Algorithm
Optimality Theory Basics
📐 The OT Framework
Traditional OT (Linguistics):
Input → Generator → Candidates → Evaluator → Optimal Output
Adapted OT (Audio):
Corpus → Audio Files → Acoustic Analysis → Constraint Evaluation → Optimal Concatenation
Key Components:
- Candidates: Individual audio files
- Constraints: Universal acoustic preferences
- Violations: Quantitative deviation from ideal
- Ranking: Harmony Score determines optimality
The Algorithm Pipeline
Harmony Score Calculation
MFCC Analysis Parameters
Number of coefficients: 12
Window length: 0.015 seconds (15ms)
Time step: 0.005 seconds (5ms)
Frequency range: 100 Hz to 10000 Hz
Pre-emphasis: From 100 Hz
Key coefficients used:
• C0: Log energy (Frame energy)
• C1: Spectral tilt (Balance of low vs high frequencies)
Why these parameters?
• 15ms window: Good time-frequency tradeoff for audio events
• 5ms step: Sufficient temporal resolution
• 12 coefficients: Standard for speech/music analysis
• 100-10000Hz: Covers most relevant audio content
Constraint System
The Four Constraints
Constraint 1: *DARKNESS
Definition: Penalizes negative spectral tilt (more energy in low frequencies)
Violation measure: Absolute value of mean C1 when C1 < 0
Calculation: viol_darkness = |mean_C1| if mean_C1 < 0, else 0
Acoustic interpretation: "Avoid dark, bass-heavy sounds"
Weight tuning: Increase weight_darkness to penalize dark sounds more severely
Constraint 2: *BRIGHTNESS
Definition: Penalizes positive spectral tilt (more energy in high frequencies)
Violation measure: Value of mean C1 when C1 > 0
Calculation: viol_brightness = mean_C1 if mean_C1 > 0, else 0
Acoustic interpretation: "Avoid bright, treble-heavy sounds"
Weight tuning: Increase weight_brightness to penalize bright sounds more severely
Constraint 3: *LOW-ENERGY
Definition: Penalizes low energy (quiet sounds)
Violation measure: Deviation from maximum energy (100 - mean_C0)
Calculation: viol_energy = max(0, 100 - mean_C0)
Acoustic interpretation: "Avoid quiet sounds"
Weight tuning: Increase weight_energy to prioritize loud sounds
Constraint 4: *UNSTABLE
Definition: Penalizes spectral instability (variance in spectral tilt)
Violation measure: Standard deviation of C1 across frames × 10
Calculation: viol_stability = stdev_C1 × 10
Acoustic interpretation: "Avoid sounds with changing spectral character"
Weight tuning: Increase weight_stability to prioritize spectrally stable sounds
Constraint Interactions
- DARKNESS vs BRIGHTNESS: Mutually exclusive — a sound cannot violate both simultaneously
- LOW-ENERGY vs UNSTABLE: Independent — can violate both, neither, or one
- Weight balancing: Set one weight to 0.0 to effectively disable that constraint
- Dominance hierarchy: Higher weights create strict rankings (strict domination in OT terms)
Example scenarios:
- For podcast concatenation: weight_energy=2.0, weight_stability=1.5, others=0.0
- For musical phrase assembly: weight_brightness=1.0, weight_stability=2.0, weight_energy=1.0
- For sound effect sequencing: weight_darkness=0.5, weight_brightness=0.5, weight_energy=1.0
Parameter Reference
| Parameter | Type | Default | Range | Description |
|---|---|---|---|---|
| limit_files | integer | 10 | 1-∞ | Number of top-ranked files to concatenate |
| weight_darkness | real | 0.0 | 0.0-10.0 | Penalty weight for negative spectral tilt |
| weight_brightness | real | 1.0 | 0.0-10.0 | Penalty weight for positive spectral tilt |
| weight_energy | real | 2.0 | 0.0-10.0 | Penalty weight for low energy |
| weight_stability | real | 1.0 | 0.0-10.0 | Penalty weight for spectral instability |
| play_result | boolean | 1 (true) | 0/1 | Auto-play concatenated result |
Acoustic Analysis Details
MFCC Coefficient Interpretation
C0 — Log Energy:
• Logarithm of frame energy
• Higher values = louder frames
• Typical range: 50-100 in Praat's MFCC implementation
• Script use: Mean C0 = overall loudness measure
C1 — Spectral Tilt:
• Balance between low and high frequencies
• Negative values: More energy in low frequencies (dark sound)
• Positive values: More energy in high frequencies (bright sound)
• Near zero: Balanced spectrum
• Script use: Mean C1 = overall brightness/darkness; Stdev C1 = spectral stability
Why MFCC?
• Perceptually motivated (Mel scale)
• Standard in speech/music analysis
• Captures timbral characteristics
• Computationally efficient
Violation Calculation Details
Analysis Performance
- Small corpus (10-20 files): 2-5 seconds
- Medium corpus (50-100 files): 10-30 seconds
- Large corpus (200+ files): 1-3 minutes
Factors affecting speed:
- File duration (longer files = more frames to analyze)
- Number of files in directory
- Computer processing power
- Praat version and optimization
Memory usage: All files loaded during concatenation phase — keep limit_files reasonable for large files.
Applications
Corpus-Based Composition
Use case: Automatic selection of audio fragments for musical composition
Technique: Weight constraints to match desired sonic character
Example: For energetic electronic music: weight_energy=3.0, weight_stability=1.0, others=0.0
Speech Database Organization
Use case: Ranking speech samples by acoustic properties
Technique: Adjust weights to prioritize clear, stable, loud speech
Workflow:
- Set weight_energy=2.0 (prioritize audible speech)
- Set weight_stability=1.5 (prioritize consistent vocal quality)
- Set weight_brightness=0.5 (slightly penalize sibilance)
- Result: Top-ranked files are clearest speech samples
Sound Effect Sequencing
Use case: Creating optimized sound effect sequences for games/film
Technique: Constraint-based selection for perceptual flow
Example workflow:
- Folder contains 50 impact sounds
- Goal: Create sequence of 10 most "solid" impacts
- Settings: weight_darkness=1.0 (bassy impacts), weight_energy=2.0 (loud), weight_stability=1.5 (consistent)
- Result: Concatenated sequence of optimal impact sounds
Research Applications
Use case: Systematic audio stimulus selection for experiments
Advantages:
- Objective, reproducible selection criteria
- Quantifiable acoustic properties
- Configurable for different experimental conditions
- Detailed violation reporting for transparency
Example: Psychoacoustic study needing stimuli with specific spectral balance
Educational Use
Use case: Teaching OT concepts with audio examples
Technique: Students adjust weights, hear resulting concatenations
Learning outcomes:
- Understand constraint-based optimization
- Connect acoustic properties to perceptual qualities
- Experience weight adjustment effects
- Learn MFCC analysis basics
Practical Workflow Examples
🎵 Musical Phrase Assembly
Goal: Create smooth melodic sequence from note recordings
Settings:
- limit_files: 8
- weight_stability: 2.5 (prioritize consistent tone)
- weight_energy: 1.5 (moderately loud)
- weight_brightness: 0.8 (slightly bright)
- weight_darkness: 0.0
Result: 8 most stable, bright notes concatenated
🗣️ Speech Sample Selection
Goal: Find clearest speech utterances from noisy recordings
Settings:
- limit_files: 5
- weight_energy: 3.0 (maximize loudness)
- weight_stability: 2.0 (minimize vocal variation)
- weight_brightness: 0.0 (ignore brightness)
- weight_darkness: 0.0
Result: 5 loudest, most stable speech samples
🎬 Sound Design Sequence
Goal: Create evolving texture from organic recordings
Settings:
- limit_files: 12
- weight_darkness: 1.0 → 0.0 (progressive brightening)
- weight_energy: 0.5 → 2.0 (progressive intensification)
- Multiple runs with different weights for sections
Result: Thematically evolving sound sequence
Complete Workflow
Step-by-Step Execution
🔧 Script Execution Flow
Phase 1: Setup & Input
- User runs script, fills parameter form
- Script prompts for directory selection
- Validates directory contains .wav files
- Creates file list object
Phase 2: Analysis
- Creates analysis table with violation columns
- Loops through each file:
- Loads sound
- Computes MFCC
- Calculates mean C0, mean C1, C1 stdev
- Computes four violation scores
- Calculates Harmony Score
- Stores in table
Phase 3: Ranking
- Sorts table by Harmony Score (ascending)
- Displays constraint weights in Info window
- Prints top N files with detailed violation breakdown
Phase 4: Concatenation
- Loads top N files into memory
- Concatenates sequentially
- Renames result to "OT_Optimal_Concatenation"
- Cleans up intermediate objects
Phase 5: Output
- Selects final concatenated sound
- Displays success message
- Optionally plays result
Output Interpretation
Troubleshooting Common Issues
Cause: Directory doesn't contain .wav files, or path incorrect
Solution: Ensure directory contains .wav files (check extension case: .wav not .WAV)
Cause: All files have high Harmony Scores (severe violations)
Solution: Adjust weights to be less strict, or check if files are extremely quiet/unstable
Cause: Raw concatenation without crossfade
Solution: Apply crossfade manually after concatenation, or edit files to start/end at zero-crossings
Cause: Many long files, or computer performance issues
Solution: Reduce corpus size, use shorter files, or increase Praat memory allocation
Advanced Techniques
- Run script multiple times with different weights
- Compare resulting concatenations
- Note which files appear/disappear in top rankings
- Fine-tune weights to achieve desired sonic character
- First pass: Select files with weight_energy=3.0 (loudest files)
- Second pass: From those, select with weight_stability=2.0 (most stable)
- Third pass: Final selection with mixed constraints
- Result: Progressively refined selection