OT Corpus Concatenator — User Guide

Optimality Theory-based audio corpus concatenation: Uses constraint-based ranking to select and concatenate optimal audio files based on spectral and energetic properties.

Author: Shai Cohen Affiliation: Department of Music, Bar-Ilan University, Israel Version: 1.0 (2025) License: MIT License Repo: https://github.com/ShaiCohen-ops/Praat-plugin_AudioTools
Contents:

What this does

This script implements Optimality Theory-based audio concatenation — a computational linguistics-inspired approach to automatically select and concatenate audio files from a corpus based on weighted constraint satisfaction. The system analyzes each audio file for four key acoustic properties, computes constraint violations, and ranks files by their Harmony Score (lower = better). The top N files are then concatenated into a single optimized audio sequence.

Key Features:

What is Optimality Theory (OT)? OT originates from theoretical linguistics as a model of grammatical constraints. In this audio adaptation: (1) Candidates: Audio files in corpus. (2) Constraints: Acoustic preferences (e.g., avoid low energy, prefer stable spectra). (3) Violations: Quantitative measures of how much each candidate violates constraints. (4) Harmony Score: Weighted sum of violations (lower = more optimal). (5) Ranking: Candidates sorted by Harmony Score, top N selected. This transforms subjective "sounds good together" into computable constraint satisfaction.

Technical Implementation: (1) Directory scanning: Loads all .wav files from user-selected folder. (2) MFCC analysis: Extracts C0 (energy) and C1 (spectral tilt) coefficients. (3) Constraint violation calculation: Computes four violation scores per file. (4) Harmony scoring: Weighted sum: Σ(violation × weight). (5) Ranking: Sort ascending (lower harmony = better). (6) Concatenation: Join top N files sequentially. (7) Reporting: Detailed output in Praat Info window. Key insight: Weight adjustments prioritize different acoustic properties — e.g., high weight_energy prioritizes loud files, high weight_stability prioritizes spectrally consistent files.

Quick start

  1. Prepare a folder containing .wav audio files for concatenation.
  2. In Praat, open the script: OT_Corpus_Concatenator.praat.
  3. Adjust limit_files (number of files to concatenate, max 10).
  4. Set constraint weights according to your acoustic preferences:
    • weight_darkness — Penalizes negative spectral tilt (dark sounds)
    • weight_brightness — Penalizes positive spectral tilt (bright sounds)
    • weight_energy — Penalizes low energy (quiet sounds)
    • weight_stability — Penalizes spectral instability
  5. Enable play_result to auto-play concatenated output.
  6. Click Run — select folder when prompted.
  7. Script analyzes all .wav files, ranks by Harmony Score, concatenates top N.
  8. Output named "OT_Optimal_Concatenation" appears in Objects window.
Quick tip: Start with default weights: weight_energy = 2.0, others = 1.0 or 0.0. This prioritizes loud, energetic files. For smooth, consistent concatenation: set weight_stability = 2.0, weight_energy = 1.0. For bright, articulate results: set weight_brightness = 0.0, weight_darkness = 2.0. Check Praat Info window for detailed ranking and violation breakdowns. Output is raw concatenation — consider adding crossfades manually if needed.
Important: ANALYZES ALL .WAV FILES — ensure folder contains only intended audio files. Large corporas may take time to analyze. MFCC parameters fixed: 12 coefficients, 15ms window, 5ms shift (optimized for speech/music). Constraint weights multiplicative: Higher weights dramatically affect ranking. Harmony Score interpretation: Lower = better (minimal violations). Concatenation is sequential: Files joined in ranking order (best first). No crossfade applied: Raw concatenation may have clicks at boundaries. Memory usage: All files loaded simultaneously during concatenation — limit_files controls memory footprint.

OT Theory & Algorithm

Optimality Theory Basics

📐 The OT Framework

Traditional OT (Linguistics):

Input → Generator → Candidates → Evaluator → Optimal Output

Adapted OT (Audio):

Corpus → Audio Files → Acoustic Analysis → Constraint Evaluation → Optimal Concatenation


Key Components:

  • Candidates: Individual audio files
  • Constraints: Universal acoustic preferences
  • Violations: Quantitative deviation from ideal
  • Ranking: Harmony Score determines optimality

The Algorithm Pipeline

COMPLETE PROCESSING PIPELINE: STEP 1: INPUT User selects directory containing .wav files Script scans for all *.wav files Creates file list (candidate set) STEP 2: ANALYSIS LOOP (per file) FOR each audio file in corpus: • Load Sound object • Extract MFCC (12 coeffs, 15ms window, 5ms shift) • Calculate mean C0 (energy) across frames • Calculate mean C1 (spectral tilt) across frames • Calculate C1 standard deviation (stability) • Compute constraint violations • Calculate Harmony Score • Store in analysis table END FOR STEP 3: RANKING Sort table by Harmony Score (ascending) Select top N files (limit_files parameter) STEP 4: CONCATENATION Load selected files sequentially Concatenate into single Sound object Rename: "OT_Optimal_Concatenation" STEP 5: OUTPUT Display detailed ranking in Info window Save concatenated result to Objects window Optional: Auto-play result

Harmony Score Calculation

HARMONY SCORE FORMULA: Given: v_dark = darkness violation v_bright = brightness violation v_energy = low-energy violation v_stable = instability violation w_dark = weight_darkness w_bright = weight_brightness w_energy = weight_energy w_stable = weight_stability Harmony Score = (v_dark × w_dark) + (v_bright × w_bright) + (v_energy × w_energy) + (v_stable × w_stable) Interpretation: • Lower Harmony Score = more optimal • Zero = perfect satisfaction of all constraints • Higher = more severe violations • Weight scaling: violations × weight Example: File A: v_dark=2.1, v_bright=0, v_energy=15, v_stable=3.2 Weights: w_dark=0, w_bright=1, w_energy=2, w_stable=1 Harmony = (2.1×0) + (0×1) + (15×2) + (3.2×1) = 0 + 0 + 30 + 3.2 = 33.2 File B: v_dark=0, v_bright=1.5, v_energy=8, v_stable=1.0 Harmony = (0×0) + (1.5×1) + (8×2) + (1.0×1) = 0 + 1.5 + 16 + 1 = 18.5 File B wins (lower harmony = more optimal)

MFCC Analysis Parameters

MFCC Configuration (fixed in script):

Number of coefficients: 12
Window length: 0.015 seconds (15ms)
Time step: 0.005 seconds (5ms)
Frequency range: 100 Hz to 10000 Hz
Pre-emphasis: From 100 Hz

Key coefficients used:
C0: Log energy (Frame energy)
C1: Spectral tilt (Balance of low vs high frequencies)

Why these parameters?
• 15ms window: Good time-frequency tradeoff for audio events
• 5ms step: Sufficient temporal resolution
• 12 coefficients: Standard for speech/music analysis
• 100-10000Hz: Covers most relevant audio content

Constraint System

The Four Constraints

Constraint 1: *DARKNESS

Definition: Penalizes negative spectral tilt (more energy in low frequencies)

Violation measure: Absolute value of mean C1 when C1 < 0

Calculation: viol_darkness = |mean_C1| if mean_C1 < 0, else 0

Acoustic interpretation: "Avoid dark, bass-heavy sounds"

Weight tuning: Increase weight_darkness to penalize dark sounds more severely

Constraint 2: *BRIGHTNESS

Definition: Penalizes positive spectral tilt (more energy in high frequencies)

Violation measure: Value of mean C1 when C1 > 0

Calculation: viol_brightness = mean_C1 if mean_C1 > 0, else 0

Acoustic interpretation: "Avoid bright, treble-heavy sounds"

Weight tuning: Increase weight_brightness to penalize bright sounds more severely

Constraint 3: *LOW-ENERGY

Definition: Penalizes low energy (quiet sounds)

Violation measure: Deviation from maximum energy (100 - mean_C0)

Calculation: viol_energy = max(0, 100 - mean_C0)

Acoustic interpretation: "Avoid quiet sounds"

Weight tuning: Increase weight_energy to prioritize loud sounds

Constraint 4: *UNSTABLE

Definition: Penalizes spectral instability (variance in spectral tilt)

Violation measure: Standard deviation of C1 across frames × 10

Calculation: viol_stability = stdev_C1 × 10

Acoustic interpretation: "Avoid sounds with changing spectral character"

Weight tuning: Increase weight_stability to prioritize spectrally stable sounds

Constraint Interactions

Constraint Conflicts:
  • DARKNESS vs BRIGHTNESS: Mutually exclusive — a sound cannot violate both simultaneously
  • LOW-ENERGY vs UNSTABLE: Independent — can violate both, neither, or one
  • Weight balancing: Set one weight to 0.0 to effectively disable that constraint
  • Dominance hierarchy: Higher weights create strict rankings (strict domination in OT terms)

Example scenarios:

  • For podcast concatenation: weight_energy=2.0, weight_stability=1.5, others=0.0
  • For musical phrase assembly: weight_brightness=1.0, weight_stability=2.0, weight_energy=1.0
  • For sound effect sequencing: weight_darkness=0.5, weight_brightness=0.5, weight_energy=1.0

Parameter Reference

Parameter Type Default Range Description
limit_files integer 10 1-∞ Number of top-ranked files to concatenate
weight_darkness real 0.0 0.0-10.0 Penalty weight for negative spectral tilt
weight_brightness real 1.0 0.0-10.0 Penalty weight for positive spectral tilt
weight_energy real 2.0 0.0-10.0 Penalty weight for low energy
weight_stability real 1.0 0.0-10.0 Penalty weight for spectral instability
play_result boolean 1 (true) 0/1 Auto-play concatenated result

Acoustic Analysis Details

MFCC Coefficient Interpretation

MFCC (Mel-Frequency Cepstral Coefficients):

C0 — Log Energy:
• Logarithm of frame energy
• Higher values = louder frames
• Typical range: 50-100 in Praat's MFCC implementation
Script use: Mean C0 = overall loudness measure

C1 — Spectral Tilt:
• Balance between low and high frequencies
Negative values: More energy in low frequencies (dark sound)
Positive values: More energy in high frequencies (bright sound)
Near zero: Balanced spectrum
Script use: Mean C1 = overall brightness/darkness; Stdev C1 = spectral stability

Why MFCC?
• Perceptually motivated (Mel scale)
• Standard in speech/music analysis
• Captures timbral characteristics
• Computationally efficient

Violation Calculation Details

DETAILED VIOLATION CALCULATIONS: Given MFCC analysis of file: mean_C0 = average of C0 across all frames mean_C1 = average of C1 across all frames stdev_C1 = standard deviation of C1 across all frames 1. DARKNESS violation: IF mean_C1 < 0: viol_dark = abs(mean_C1) ELSE: viol_dark = 0 2. BRIGHTNESS violation: IF mean_C1 > 0: viol_bright = mean_C1 ELSE: viol_bright = 0 3. LOW-ENERGY violation: viol_energy = 100 - mean_C0 IF viol_energy < 0: viol_energy = 0 Note: 100 is approximate maximum in Praat's MFCC Actual max depends on audio, but 100 is safe ceiling 4. UNSTABLE violation: viol_stable = stdev_C1 × 10 Multiplication by 10 scales to similar magnitude as other violations for balanced weighting Example actual values: File: mean_C0 = 85.3, mean_C1 = -1.2, stdev_C1 = 0.8 Violations: dark=1.2, bright=0, energy=14.7, stable=8.0

Analysis Performance

Processing Time Estimates:
  • Small corpus (10-20 files): 2-5 seconds
  • Medium corpus (50-100 files): 10-30 seconds
  • Large corpus (200+ files): 1-3 minutes

Factors affecting speed:

  • File duration (longer files = more frames to analyze)
  • Number of files in directory
  • Computer processing power
  • Praat version and optimization

Memory usage: All files loaded during concatenation phase — keep limit_files reasonable for large files.

Applications

Corpus-Based Composition

Use case: Automatic selection of audio fragments for musical composition

Technique: Weight constraints to match desired sonic character

Example: For energetic electronic music: weight_energy=3.0, weight_stability=1.0, others=0.0

Speech Database Organization

Use case: Ranking speech samples by acoustic properties

Technique: Adjust weights to prioritize clear, stable, loud speech

Workflow:

Sound Effect Sequencing

Use case: Creating optimized sound effect sequences for games/film

Technique: Constraint-based selection for perceptual flow

Example workflow:

Research Applications

Use case: Systematic audio stimulus selection for experiments

Advantages:

Example: Psychoacoustic study needing stimuli with specific spectral balance

Educational Use

Use case: Teaching OT concepts with audio examples

Technique: Students adjust weights, hear resulting concatenations

Learning outcomes:

Practical Workflow Examples

🎵 Musical Phrase Assembly

Goal: Create smooth melodic sequence from note recordings

Settings:

  • limit_files: 8
  • weight_stability: 2.5 (prioritize consistent tone)
  • weight_energy: 1.5 (moderately loud)
  • weight_brightness: 0.8 (slightly bright)
  • weight_darkness: 0.0

Result: 8 most stable, bright notes concatenated

🗣️ Speech Sample Selection

Goal: Find clearest speech utterances from noisy recordings

Settings:

  • limit_files: 5
  • weight_energy: 3.0 (maximize loudness)
  • weight_stability: 2.0 (minimize vocal variation)
  • weight_brightness: 0.0 (ignore brightness)
  • weight_darkness: 0.0

Result: 5 loudest, most stable speech samples

🎬 Sound Design Sequence

Goal: Create evolving texture from organic recordings

Settings:

  • limit_files: 12
  • weight_darkness: 1.0 → 0.0 (progressive brightening)
  • weight_energy: 0.5 → 2.0 (progressive intensification)
  • Multiple runs with different weights for sections

Result: Thematically evolving sound sequence

Complete Workflow

Step-by-Step Execution

🔧 Script Execution Flow

Phase 1: Setup & Input

  1. User runs script, fills parameter form
  2. Script prompts for directory selection
  3. Validates directory contains .wav files
  4. Creates file list object

Phase 2: Analysis

  1. Creates analysis table with violation columns
  2. Loops through each file:
    • Loads sound
    • Computes MFCC
    • Calculates mean C0, mean C1, C1 stdev
    • Computes four violation scores
    • Calculates Harmony Score
    • Stores in table

Phase 3: Ranking

  1. Sorts table by Harmony Score (ascending)
  2. Displays constraint weights in Info window
  3. Prints top N files with detailed violation breakdown

Phase 4: Concatenation

  1. Loads top N files into memory
  2. Concatenates sequentially
  3. Renames result to "OT_Optimal_Concatenation"
  4. Cleans up intermediate objects

Phase 5: Output

  1. Selects final concatenated sound
  2. Displays success message
  3. Optionally plays result

Output Interpretation

TYPICAL INFO WINDOW OUTPUT: ============================================ CONSTRAINT WEIGHTS: *DARKNESS = 0.00 (penalizes negative spectral tilt) *BRIGHTNESS = 1.00 (penalizes positive spectral tilt) *LOW-ENERGY = 2.00 (penalizes low loudness) *UNSTABLE = 1.00 (penalizes timbral variance) ============================================ RANKING: Top 5 files by Harmony Score -------------------------------------------- 1. file03.wav → Harmony: 12.34 Features: Energy=92.1 | Tilt=-0.15 Violations: *LOW-ENERGY = 7.90 × 2.00 = 15.80 *UNSTABLE = 1.23 × 1.00 = 1.23 2. file15.wav → Harmony: 18.56 Features: Energy=88.5 | Tilt=0.32 Violations: *BRIGHTNESS = 0.32 × 1.00 = 0.32 *LOW-ENERGY = 11.50 × 2.00 = 23.00 *UNSTABLE = 0.76 × 1.00 = 0.76 ... etc ... ============================================ SUCCESS! Created: OT_Optimal_Concatenation Contains 5 concatenated files ============================================

Troubleshooting Common Issues

Problem: "No .wav files found in that directory!"
Cause: Directory doesn't contain .wav files, or path incorrect
Solution: Ensure directory contains .wav files (check extension case: .wav not .WAV)
Problem: Script runs but output is empty/silent
Cause: All files have high Harmony Scores (severe violations)
Solution: Adjust weights to be less strict, or check if files are extremely quiet/unstable
Problem: Concatenation has clicks/pops between files
Cause: Raw concatenation without crossfade
Solution: Apply crossfade manually after concatenation, or edit files to start/end at zero-crossings
Problem: Analysis takes extremely long
Cause: Many long files, or computer performance issues
Solution: Reduce corpus size, use shorter files, or increase Praat memory allocation

Advanced Techniques

Iterative Weight Adjustment:
  • Run script multiple times with different weights
  • Compare resulting concatenations
  • Note which files appear/disappear in top rankings
  • Fine-tune weights to achieve desired sonic character
Multi-Stage Processing:
  • First pass: Select files with weight_energy=3.0 (loudest files)
  • Second pass: From those, select with weight_stability=2.0 (most stable)
  • Third pass: Final selection with mixed constraints
  • Result: Progressively refined selection