OT Grammar Learning from Audio — v1.0 User Guide
Learns a melodic well‑formedness grammar from audio using error‑driven constraint ranking. Implements two classic OT learning algorithms: the Gradual Learning Algorithm (GLA) and Recursive Constraint Demotion (RCD). GEN (candidate generator) operates in two modes: Neighbor‑GEN (single file) or Pair‑corpus (good/bad folders).
What this does
OT Grammar Learning from Audio implements two classic error‑driven learning algorithms from Optimality Theory (OT): the Gradual Learning Algorithm (GLA) and Recursive Constraint Demotion (RCD). The system extracts a melody from audio, generates candidate pairs (winner/loser), and learns a ranking of 15 melodic well‑formedness constraints.
The constraint set is fixed (15 constraints, see table below). The script does not discover new constraints – it learns which constraints are ranked higher or lower based on the observed melody (Neighbor‑GEN) or a corpus of good/bad examples (Pair‑corpus).
Quick start
- In Praat, select exactly one Sound object (mono, voiced melody).
- Run script… →
OT_Grammar_Learning_from_Audio.praat. - Choose Instrument (affects pitch‑tracker settings) and Scale (for quantisation).
- Select GEN_mode:
- Neighbor-GEN – learns from a single file (the winner). Losers are generated by perturbing one note at a time by ±1, ±2 semitones.
- Pair-corpus – learns from a folder containing
good/andbad/subfolders. Every good vs every bad melody becomes a winner/loser pair.
- Choose Algorithm:
- GLA – stochastic, handles variation, outputs continuous ranking values.
- RCD – deterministic, outputs a stratified ranking (strict domination).
- Adjust learning parameters (iterations, plasticity, noise for GLA).
- Click OK. The script extracts the melody, generates pairs, runs the learner, and outputs a ranked constraint list in the Info window, plus a Table object and visualisation.
To Pitch (ac). The constraint set is fixed (15 constraints). The output is a ranking (not a generative model).
The 15 melodic constraints
| Index | Name | Type | Description |
|---|---|---|---|
| 1 | *LEAP | Markedness (span) | Penalise interval > 5 semitones (excluding tritone). |
| 2 | *TRITONE | Markedness (span) | Penalise interval == 6 semitones. |
| 3 | *NONSTEP | Markedness (span) | Penalise interval > 2 semitones. |
| 4 | *REPEAT | Markedness (span) | Penalise interval == 0 (repeated note). |
| 5 | *SEMITONE | Markedness (span) | Penalise interval == 1. |
| 6 | *WIDE-RANGE | Markedness (global) | Penalise total range > 12 semitones. |
| 7 | *NARROW-RANGE | Markedness (global) | Penalise total range < 7 semitones. |
| 8 | *NON-SCALE | Markedness (global) | Penalise notes outside the selected scale. |
| 9 | CADENCE | Positive (well‑formedness) | Penalise when the final two notes do NOT form a cadential approach to tonic. |
| 10 | END-ON-TONIC | Positive (well‑formedness) | Penalise when the last note is not the tonic. |
| 11 | *PEAK-EARLY | Markedness (contour) | Penalise when the highest note occurs in the first half. |
| 12 | *PEAK-LATE | Markedness (contour) | Penalise when the highest note occurs in the second half. |
| 13 | ARC-SHAPE | Positive (contour) | Penalise when the contour is NOT approximately single‑peaked. |
| 14 | *DIR-CHANGE | Markedness (contour) | Penalise many direction changes (> n/3). |
| 15 | *MONOTONIC | Markedness (contour) | Penalise when >85% of motion is in one direction. |
LEAP, TRIT, NSTEP, REPEAT, SEMI, WIDE, NARROW, NSCL, CAD, END, PKEAR, PKLAT, ARC, DIRCH, MONO
GEN – candidate generation
📐 Neighbor-GEN (single file)
The extracted melody is the winner. Losers are generated by perturbing one note at a time by -perturbation_max_semitones … +perturbation_max_semitones (excluding 0). For each perturbed note, the new melody is evaluated. This creates up to nWinner × 2×perturbation_max_semitones pairs. The inductive bias: the source melody is already locally optimal; the learner discovers which constraints explain why neighbours are worse.
📁 Pair-corpus (good/bad folders)
The user selects a folder containing good/ and bad/ subfolders. Every file in good/ is a winner; every file in bad/ is a loser. The script extracts the melody from each file, computes violation vectors, and builds all good × bad pairs. This is the linguistically orthodox setup – it produces a real stylistic grammar that distinguishes well‑formed from ill‑formed melodies.
Informative pairs (where winner and loser have different violation profiles) are the only ones that drive learning. The script reports the number of informative pairs.
GLA (Gradual Learning Algorithm)
Each constraint has a continuous ranking value. Evaluation‑time noise (Gaussian) is added to produce a stochastic ranking. On each iteration:
- Pick a random pair (winner, loser).
- Add noise to each ranking value.
- Find the highest‑ranked constraint on which winner and loser differ.
- If that constraint prefers the loser (winner has more violations), it’s an error.
- Promote constraints that prefer the winner, demote constraints that prefer the loser, each by
plasticity.
Parameters:
GLA_iterations– number of learning steps (2000 typical).GLA_plasticity– step size (0.5 typical).GLA_eval_noise– standard deviation of Gaussian noise added at evaluation time (2.0 typical).GLA_initial_ranking_value– all constraints start at this value (100).
RCD (Recursive Constraint Demotion)
Classical OT: constraints are strictly ordered into strata. RCD builds strata greedily:
- Find all constraints that prefer the winner in at least one unresolved pair and never prefer the loser in any unresolved pair.
- Place them in the current stratum.
- Mark as resolved all pairs that are decided by any constraint in this stratum.
- Repeat.
RCD is deterministic, fast, and produces a ranking that is guaranteed to be compatible with all winner/loser pairs if one exists. The output is a stratum number per constraint (higher stratum = higher rank in the hierarchy). The visualisation converts strata to ranking values for display.
Parameters & defaults
Instrument & scale
| Parameter | Options | Default | Description |
|---|---|---|---|
| Instrument | Violin / Vocal / Guitar / Flute / Piano / Other | Violin | Affects pitch‑tracker floor/ceiling, time step, voicing threshold. |
| Scale | C major / G major / … / A minor (harmonic) / Chromatic | C major | Scale for quantisation (if enabled). |
| Quantize_to_scale | yes/no | yes | Snap extracted pitches to the selected scale. |
| Min_note_duration_ms | 20–500 | 80 | Minimum duration for a detected note (ms). |
GEN
Learning algorithm
Output
Visualization (Praat picture)
When Show_visualization = 1, the script draws a 5‑panel figure:
- Piano roll of extracted winner melody – green bands = in‑scale notes, pink bands = chromatic (if quantisation off). The melody is shown as a stepwise line with note transitions.
- Constraint ranking bar chart – bars for each constraint, colour‑coded by rank (deep blue = high rank). The x‑axis shows normalised ranking value; the y‑axis lists constraints from highest to lowest rank.
- Learning curve (GLA only) – error rate over epochs. RCD shows a static panel with the final error rate.
- Sample tableau – shows violation counts for the winner and loser of the first informative pair, with “W” (winner preferred) / “L” (loser preferred) markers.
- Summary panel – algorithm, GEN mode, scale, instrument, note count, pair count, informative pairs, final error rate, and parameters.
FAQ / troubleshooting
Adjust Min_note_duration_ms (lower for faster melodies) or change Instrument (e.g., Vocal vs. Violin). The pitch tracker may also need different minPitch/maxPitch – these are hardcoded per instrument but can be edited in the script.
If all generated pairs have identical violation profiles, the learner has nothing to learn. For Neighbor‑GEN, increase Perturbation_max_semitones (e.g., to 4). For Pair‑corpus, ensure that good and bad melodies actually differ in their constraint violations.
Try reducing GLA_plasticity (e.g., to 0.2) or increasing GLA_iterations (e.g., to 5000). The learning curve in the visualisation shows the error rate over time – if it’s still noisy, the data may be inconsistent or the noise level may be too high.
If the data is inconsistent with a strict domination hierarchy, RCD will place remaining constraints in the bottom stratum. The visualisation reports how many pairs were resolved. This is not an error – it simply means no single strict ranking can account for all pairs.
When Quantize_to_scale is on, extracted pitches are snapped to the nearest note in the selected scale (octave‑aware). This ensures that the melody is analysed in the intended tonal framework. For atonal or microtonal material, turn quantisation off.