OT Corpus Concatenator — User Guide

Optimality Theory-based audio corpus concatenation: Uses constraint-based ranking to select and concatenate optimal audio files based on spectral and energetic properties.

Author: Shai Cohen Affiliation: Department of Music, Bar-Ilan University, Israel Version: 1.0 (2025) License: MIT License Repo: https://github.com/ShaiCohen-ops/Praat-plugin_AudioTools

Contents:

What this does Quick start OT Theory & Algorithm Constraint System Acoustic Analysis Applications Complete Workflow

What this does

This script implements Optimality Theory-based audio concatenation — a computational linguistics-inspired approach to automatically select and concatenate audio files from a corpus based on weighted constraint satisfaction. The system analyzes each audio file for four key acoustic properties, computes constraint violations, and ranks files by their Harmony Score (lower = better). The top N files are then concatenated into a single optimized audio sequence.

Key Features:

OT Constraint System — Four weighted constraints: DARKNESS, BRIGHTNESS, LOW-ENERGY, UNSTABLE
MFCC-Based Analysis — Uses MFCC C0 (energy) and C1 (spectral tilt) for acoustic characterization
Dynamic Ranking — Real-time constraint violation calculation and sorting
Flexible Selection — Configurable number of output files (limit_files parameter)
Complete Workflow — From directory selection to final concatenation in one script
Detailed Reporting — Comprehensive info window output with violation breakdowns

What is Optimality Theory (OT)? OT originates from theoretical linguistics as a model of grammatical constraints. In this audio adaptation: (1) Candidates: Audio files in corpus. (2) Constraints: Acoustic preferences (e.g., avoid low energy, prefer stable spectra). (3) Violations: Quantitative measures of how much each candidate violates constraints. (4) Harmony Score: Weighted sum of violations (lower = more optimal). (5) Ranking: Candidates sorted by Harmony Score, top N selected. This transforms subjective "sounds good together" into computable constraint satisfaction.

Technical Implementation: (1) Directory scanning: Loads all .wav files from user-selected folder. (2) MFCC analysis: Extracts C0 (energy) and C1 (spectral tilt) coefficients. (3) Constraint violation calculation: Computes four violation scores per file. (4) Harmony scoring: Weighted sum: Σ(violation × weight). (5) Ranking: Sort ascending (lower harmony = better). (6) Concatenation: Join top N files sequentially. (7) Reporting: Detailed output in Praat Info window. Key insight: Weight adjustments prioritize different acoustic properties — e.g., high weight_energy prioritizes loud files, high weight_stability prioritizes spectrally consistent files.

Quick start

Prepare a folder containing .wav audio files for concatenation.
In Praat, open the script: OT_Corpus_Concatenator.praat.
Adjust limit_files (number of files to concatenate, max 10).
Set constraint weights according to your acoustic preferences:
- weight_darkness — Penalizes negative spectral tilt (dark sounds)
- weight_brightness — Penalizes positive spectral tilt (bright sounds)
- weight_energy — Penalizes low energy (quiet sounds)
- weight_stability — Penalizes spectral instability
Enable play_result to auto-play concatenated output.
Click Run — select folder when prompted.
Script analyzes all .wav files, ranks by Harmony Score, concatenates top N.
Output named "OT_Optimal_Concatenation" appears in Objects window.

Quick tip: Start with default weights: weight_energy = 2.0, others = 1.0 or 0.0. This prioritizes loud, energetic files. For smooth, consistent concatenation: set weight_stability = 2.0, weight_energy = 1.0. For bright, articulate results: set weight_brightness = 0.0, weight_darkness = 2.0. Check Praat Info window for detailed ranking and violation breakdowns. Output is raw concatenation — consider adding crossfades manually if needed.

Important: ANALYZES ALL .WAV FILES — ensure folder contains only intended audio files. Large corporas may take time to analyze. MFCC parameters fixed: 12 coefficients, 15ms window, 5ms shift (optimized for speech/music). Constraint weights multiplicative: Higher weights dramatically affect ranking. Harmony Score interpretation: Lower = better (minimal violations). Concatenation is sequential: Files joined in ranking order (best first). No crossfade applied: Raw concatenation may have clicks at boundaries. Memory usage: All files loaded simultaneously during concatenation — limit_files controls memory footprint.

OT Theory & Algorithm

Optimality Theory Basics

📐 The OT Framework

Traditional OT (Linguistics):

Input → Generator → Candidates → Evaluator → Optimal Output

Adapted OT (Audio):

Corpus → Audio Files → Acoustic Analysis → Constraint Evaluation → Optimal Concatenation

Key Components:

Candidates: Individual audio files
Constraints: Universal acoustic preferences
Violations: Quantitative deviation from ideal
Ranking: Harmony Score determines optimality

The Algorithm Pipeline

COMPLETE PROCESSING PIPELINE: STEP 1: INPUT User selects directory containing .wav files Script scans for all *.wav files Creates file list (candidate set) STEP 2: ANALYSIS LOOP (per file) FOR each audio file in corpus: • Load Sound object • Extract MFCC (12 coeffs, 15ms window, 5ms shift) • Calculate mean C0 (energy) across frames • Calculate mean C1 (spectral tilt) across frames • Calculate C1 standard deviation (stability) • Compute constraint violations • Calculate Harmony Score • Store in analysis table END FOR STEP 3: RANKING Sort table by Harmony Score (ascending) Select top N files (limit_files parameter) STEP 4: CONCATENATION Load selected files sequentially Concatenate into single Sound object Rename: "OT_Optimal_Concatenation" STEP 5: OUTPUT Display detailed ranking in Info window Save concatenated result to Objects window Optional: Auto-play result

Harmony Score Calculation

HARMONY SCORE FORMULA: Given: v_dark = darkness violation v_bright = brightness violation v_energy = low-energy violation v_stable = instability violation w_dark = weight_darkness w_bright = weight_brightness w_energy = weight_energy w_stable = weight_stability Harmony Score = (v_dark × w_dark) + (v_bright × w_bright) + (v_energy × w_energy) + (v_stable × w_stable) Interpretation: • Lower Harmony Score = more optimal • Zero = perfect satisfaction of all constraints • Higher = more severe violations • Weight scaling: violations × weight Example: File A: v_dark=2.1, v_bright=0, v_energy=15, v_stable=3.2 Weights: w_dark=0, w_bright=1, w_energy=2, w_stable=1 Harmony = (2.1×0) + (0×1) + (15×2) + (3.2×1) = 0 + 0 + 30 + 3.2 = 33.2 File B: v_dark=0, v_bright=1.5, v_energy=8, v_stable=1.0 Harmony = (0×0) + (1.5×1) + (8×2) + (1.0×1) = 0 + 1.5 + 16 + 1 = 18.5 File B wins (lower harmony = more optimal)

MFCC Analysis Parameters

MFCC Configuration (fixed in script):

Number of coefficients: 12
Window length: 0.015 seconds (15ms)
Time step: 0.005 seconds (5ms)
Frequency range: 100 Hz to 10000 Hz
Pre-emphasis: From 100 Hz

Key coefficients used:
• C0: Log energy (Frame energy)
• C1: Spectral tilt (Balance of low vs high frequencies)

Why these parameters?
• 15ms window: Good time-frequency tradeoff for audio events
• 5ms step: Sufficient temporal resolution
• 12 coefficients: Standard for speech/music analysis
• 100-10000Hz: Covers most relevant audio content

Constraint System

The Four Constraints

Constraint 1: *DARKNESS

Definition: Penalizes negative spectral tilt (more energy in low frequencies)

Violation measure: Absolute value of mean C1 when C1 < 0

Calculation: viol_darkness = |mean_C1| if mean_C1 < 0, else 0

Acoustic interpretation: "Avoid dark, bass-heavy sounds"

Weight tuning: Increase weight_darkness to penalize dark sounds more severely

Constraint 2: *BRIGHTNESS

Definition: Penalizes positive spectral tilt (more energy in high frequencies)

Violation measure: Value of mean C1 when C1 > 0

Calculation: viol_brightness = mean_C1 if mean_C1 > 0, else 0

Acoustic interpretation: "Avoid bright, treble-heavy sounds"

Weight tuning: Increase weight_brightness to penalize bright sounds more severely

Constraint 3: *LOW-ENERGY

Definition: Penalizes low energy (quiet sounds)

Violation measure: Deviation from maximum energy (100 - mean_C0)

Calculation: viol_energy = max(0, 100 - mean_C0)

Acoustic interpretation: "Avoid quiet sounds"

Weight tuning: Increase weight_energy to prioritize loud sounds

Constraint 4: *UNSTABLE

Definition: Penalizes spectral instability (variance in spectral tilt)

Violation measure: Standard deviation of C1 across frames × 10

Calculation: viol_stability = stdev_C1 × 10

Acoustic interpretation: "Avoid sounds with changing spectral character"

Weight tuning: Increase weight_stability to prioritize spectrally stable sounds

Constraint Interactions

Constraint Conflicts:

DARKNESS vs BRIGHTNESS: Mutually exclusive — a sound cannot violate both simultaneously
LOW-ENERGY vs UNSTABLE: Independent — can violate both, neither, or one
Weight balancing: Set one weight to 0.0 to effectively disable that constraint
Dominance hierarchy: Higher weights create strict rankings (strict domination in OT terms)

Example scenarios:

For podcast concatenation: weight_energy=2.0, weight_stability=1.5, others=0.0
For musical phrase assembly: weight_brightness=1.0, weight_stability=2.0, weight_energy=1.0
For sound effect sequencing: weight_darkness=0.5, weight_brightness=0.5, weight_energy=1.0

Parameter Reference

Parameter	Type	Default	Range	Description
limit_files	integer	10	1-∞	Number of top-ranked files to concatenate
weight_darkness	real	0.0	0.0-10.0	Penalty weight for negative spectral tilt
weight_brightness	real	1.0	0.0-10.0	Penalty weight for positive spectral tilt
weight_energy	real	2.0	0.0-10.0	Penalty weight for low energy
weight_stability	real	1.0	0.0-10.0	Penalty weight for spectral instability
play_result	boolean	1 (true)	0/1	Auto-play concatenated result

Acoustic Analysis Details

MFCC Coefficient Interpretation

MFCC (Mel-Frequency Cepstral Coefficients):

C0 — Log Energy:
• Logarithm of frame energy
• Higher values = louder frames
• Typical range: 50-100 in Praat's MFCC implementation
• Script use: Mean C0 = overall loudness measure

C1 — Spectral Tilt:
• Balance between low and high frequencies
• Negative values: More energy in low frequencies (dark sound)
• Positive values: More energy in high frequencies (bright sound)
• Near zero: Balanced spectrum
• Script use: Mean C1 = overall brightness/darkness; Stdev C1 = spectral stability

Why MFCC?
• Perceptually motivated (Mel scale)
• Standard in speech/music analysis
• Captures timbral characteristics
• Computationally efficient

Violation Calculation Details

DETAILED VIOLATION CALCULATIONS: Given MFCC analysis of file: mean_C0 = average of C0 across all frames mean_C1 = average of C1 across all frames stdev_C1 = standard deviation of C1 across all frames 1. DARKNESS violation: IF mean_C1 < 0: viol_dark = abs(mean_C1) ELSE: viol_dark = 0 2. BRIGHTNESS violation: IF mean_C1 > 0: viol_bright = mean_C1 ELSE: viol_bright = 0 3. LOW-ENERGY violation: viol_energy = 100 - mean_C0 IF viol_energy < 0: viol_energy = 0 Note: 100 is approximate maximum in Praat's MFCC Actual max depends on audio, but 100 is safe ceiling 4. UNSTABLE violation: viol_stable = stdev_C1 × 10 Multiplication by 10 scales to similar magnitude as other violations for balanced weighting Example actual values: File: mean_C0 = 85.3, mean_C1 = -1.2, stdev_C1 = 0.8 Violations: dark=1.2, bright=0, energy=14.7, stable=8.0

Analysis Performance

Processing Time Estimates:

Small corpus (10-20 files): 2-5 seconds
Medium corpus (50-100 files): 10-30 seconds
Large corpus (200+ files): 1-3 minutes

Factors affecting speed:

File duration (longer files = more frames to analyze)
Number of files in directory
Computer processing power
Praat version and optimization

Memory usage: All files loaded during concatenation phase — keep limit_files reasonable for large files.

Applications

Corpus-Based Composition

Use case: Automatic selection of audio fragments for musical composition

Technique: Weight constraints to match desired sonic character

Example: For energetic electronic music: weight_energy=3.0, weight_stability=1.0, others=0.0

Speech Database Organization

Use case: Ranking speech samples by acoustic properties

Technique: Adjust weights to prioritize clear, stable, loud speech

Workflow:

Set weight_energy=2.0 (prioritize audible speech)
Set weight_stability=1.5 (prioritize consistent vocal quality)
Set weight_brightness=0.5 (slightly penalize sibilance)
Result: Top-ranked files are clearest speech samples

Sound Effect Sequencing

Use case: Creating optimized sound effect sequences for games/film

Technique: Constraint-based selection for perceptual flow

Example workflow:

Folder contains 50 impact sounds
Goal: Create sequence of 10 most "solid" impacts
Settings: weight_darkness=1.0 (bassy impacts), weight_energy=2.0 (loud), weight_stability=1.5 (consistent)
Result: Concatenated sequence of optimal impact sounds

Research Applications

Use case: Systematic audio stimulus selection for experiments

Advantages:

Objective, reproducible selection criteria
Quantifiable acoustic properties
Configurable for different experimental conditions
Detailed violation reporting for transparency

Example: Psychoacoustic study needing stimuli with specific spectral balance

Educational Use

Use case: Teaching OT concepts with audio examples

Technique: Students adjust weights, hear resulting concatenations

Learning outcomes:

Understand constraint-based optimization
Connect acoustic properties to perceptual qualities
Experience weight adjustment effects
Learn MFCC analysis basics

Practical Workflow Examples

🎵 Musical Phrase Assembly

Goal: Create smooth melodic sequence from note recordings

Settings:

limit_files: 8
weight_stability: 2.5 (prioritize consistent tone)
weight_energy: 1.5 (moderately loud)
weight_brightness: 0.8 (slightly bright)
weight_darkness: 0.0

Result: 8 most stable, bright notes concatenated

🗣️ Speech Sample Selection

Goal: Find clearest speech utterances from noisy recordings

Settings:

limit_files: 5
weight_energy: 3.0 (maximize loudness)
weight_stability: 2.0 (minimize vocal variation)
weight_brightness: 0.0 (ignore brightness)
weight_darkness: 0.0

Result: 5 loudest, most stable speech samples

🎬 Sound Design Sequence

Goal: Create evolving texture from organic recordings

Settings:

limit_files: 12
weight_darkness: 1.0 → 0.0 (progressive brightening)
weight_energy: 0.5 → 2.0 (progressive intensification)
Multiple runs with different weights for sections

Result: Thematically evolving sound sequence

Complete Workflow

Step-by-Step Execution

🔧 Script Execution Flow

Phase 1: Setup & Input

User runs script, fills parameter form
Script prompts for directory selection
Validates directory contains .wav files
Creates file list object

Phase 2: Analysis

Creates analysis table with violation columns
Loops through each file:
- Loads sound
- Computes MFCC
- Calculates mean C0, mean C1, C1 stdev
- Computes four violation scores
- Calculates Harmony Score
- Stores in table

Phase 3: Ranking

Sorts table by Harmony Score (ascending)
Displays constraint weights in Info window
Prints top N files with detailed violation breakdown

Phase 4: Concatenation

Loads top N files into memory
Concatenates sequentially
Renames result to "OT_Optimal_Concatenation"
Cleans up intermediate objects

Phase 5: Output

Selects final concatenated sound
Displays success message
Optionally plays result

Output Interpretation

TYPICAL INFO WINDOW OUTPUT: ============================================ CONSTRAINT WEIGHTS: *DARKNESS = 0.00 (penalizes negative spectral tilt) *BRIGHTNESS = 1.00 (penalizes positive spectral tilt) *LOW-ENERGY = 2.00 (penalizes low loudness) *UNSTABLE = 1.00 (penalizes timbral variance) ============================================ RANKING: Top 5 files by Harmony Score -------------------------------------------- 1. file03.wav → Harmony: 12.34 Features: Energy=92.1 | Tilt=-0.15 Violations: *LOW-ENERGY = 7.90 × 2.00 = 15.80 *UNSTABLE = 1.23 × 1.00 = 1.23 2. file15.wav → Harmony: 18.56 Features: Energy=88.5 | Tilt=0.32 Violations: *BRIGHTNESS = 0.32 × 1.00 = 0.32 *LOW-ENERGY = 11.50 × 2.00 = 23.00 *UNSTABLE = 0.76 × 1.00 = 0.76 ... etc ... ============================================ SUCCESS! Created: OT_Optimal_Concatenation Contains 5 concatenated files ============================================

Troubleshooting Common Issues

Problem: "No .wav files found in that directory!"
Cause: Directory doesn't contain .wav files, or path incorrect
Solution: Ensure directory contains .wav files (check extension case: .wav not .WAV)

Problem: Script runs but output is empty/silent
Cause: All files have high Harmony Scores (severe violations)
Solution: Adjust weights to be less strict, or check if files are extremely quiet/unstable

Problem: Concatenation has clicks/pops between files
Cause: Raw concatenation without crossfade
Solution: Apply crossfade manually after concatenation, or edit files to start/end at zero-crossings

Problem: Analysis takes extremely long
Cause: Many long files, or computer performance issues
Solution: Reduce corpus size, use shorter files, or increase Praat memory allocation

Advanced Techniques

Iterative Weight Adjustment:

Run script multiple times with different weights
Compare resulting concatenations
Note which files appear/disappear in top rankings
Fine-tune weights to achieve desired sonic character

Multi-Stage Processing:

First pass: Select files with weight_energy=3.0 (loudest files)
Second pass: From those, select with weight_stability=2.0 (most stable)
Third pass: Final selection with mixed constraints
Result: Progressively refined selection