MCMC Musical Variation — User Guide

Metropolis-Hastings MCMC chain for musical variation. State = pitch contour + time map + dynamic map per phrase. Proposals operate at phrase/note level. Energy enforces scale conformity, voice leading, rhythmic stability, range, dynamics, phrase integrity, and contour variety.

Author: Shai Cohen Version: 1.0 (2025) Technique: Metropolis-Hastings MCMC Category: Composition / MCMC Citation: Cohen, S. (2025). Praat AudioTools
Contents:

What this does

This script implements an MCMC (Markov Chain Monte Carlo) Musical Variation engine — a Metropolis-Hastings sampler that explores the space of musical variations on a source sound. The state consists of per-phrase pitch offsets, time stretch ratios, and dynamic scaling factors. Proposals modify these parameters, and an energy function evaluates musicality based on scale conformity, voice leading, rhythmic stability, range, dynamics, phrase integrity, and contour variety. Accepted states are rendered as audio variations.

🧮 What is MCMC for Music?

Markov Chain Monte Carlo is a statistical method for sampling from complex probability distributions. In this musical context:

  • State: A set of parameters defining a variation (pitch offsets, time ratios, dynamic scales per phrase)
  • Energy: A function that quantifies "musicality" — lower energy = more musical
  • Temperature: Controls exploration vs. exploitation (high T = more exploration)
  • Proposals: Random modifications to the current state
  • Acceptance: Metropolis-Hastings criterion accepts better states sometimes, worse states occasionally (based on temperature)

The chain explores the space of variations, and accepted states are rendered as audio at regular intervals (thinning), producing a set of related but distinct musical variations.

Key Features:

Technical Implementation: (1) Phrase Detection: Silences → TextGrid → phrase intervals. (2) State Initialization: Zero pitch offsets, unity time/dynamic maps. (3) Energy Computation: Seven weighted terms. (4) MCMC Loop: Select proposal, compute energy, Metropolis-Hastings acceptance, update temperature. (5) Rendering: For accepted thinned steps, apply transformations per phrase (tape speed pitch shift, lengthen time stretch, dynamic scaling), concatenate with crossfades. (6) Visualization: 6-panel display with all data.

Quick start

  1. In Praat, select exactly one Sound object (minimum 1 second, any content).
  2. Run script… → select MCMC_Musical_Variation.praat.
  3. Choose Aesthetic_mode (2-4 for specific strategies, 1 for custom).
  4. Set chain parameters (steps, thinning, temperature start/end, annealing).
  5. Configure phrase detection (silence threshold, minimum phrase duration).
  6. Set tonal center in semitones (0=C, 2=D, 4=E, 5=F, 7=G, 9=A, 11=B).
  7. Set maximum variations and output options.
  8. Enable Draw_visualization for analysis display.
  9. Click OK — engine detects phrases, runs MCMC chain, renders variations, creates stereo mix "source_mcmc_stereo".
Quick tip: Start with Expressive mode on a 10-20 second melodic phrase. Enable visualization — you'll see the original waveform with phrase boundaries (dotted), the first variation, energy trace (red) with temperature overlay (dotted blue), pitch contour heatmap, and energy per variation. Listen to how the variations explore the space around the original, becoming more adventurous as the chain progresses. The output appears as "source_mcmc_stereo" (multi-channel mix) and individual variations if keep_individual_variations=1.
Important: PHRASE DETECTION is critical — adjust silence_thresh_dB and min_phrase_s for your material. If detection fails, script falls back to 4 equal segments. ENERGY WEIGHTS in custom mode require careful tuning — see theory section. PROPOSAL WEIGHTS determine how often each proposal type is used (must sum to 1.0). RENDERING TIME scales with steps × phrases × thin interval. For 60 steps × thin=6 × 8 phrases ≈ 10-20 seconds rendering. STEREO MIX combines all variations into a multi-channel mix (each variation in its own channel, then mixed to stereo).

MCMC & Energy Theory

The Metropolis-Hastings Algorithm

For each step t with current state θ and energy E(θ): 1. Propose new state θ' using proposal distribution q(θ'|θ) 2. Compute energy E(θ') 3. Compute acceptance probability: α = min(1, exp(-(E(θ') - E(θ)) / T)) where T is current temperature 4. Accept θ' with probability α; otherwise keep θ Temperature schedule (if annealing): T(t) = T_start + (T_end - T_start) × (t-1)/(steps-1)

State Representation

📊 State Variables per Phrase

For each phrase i (1..nPhrases):

  • pc[i] — pitch contour offset (semitones, integer)
  • tm[i] — time map ratio (>0, 1.0 = no change)
  • dm[i] — dynamic map (amplitude scale, 1.0 = no change)

Global: transpo — global semitone shift (integer)

Total state dimension: 3 × nPhrases + 1

Energy Function Components

E = w₁×E₁ + w₂×E₂ + w₃×E₃ + w₄×E₄ + w₅×E₅ + w₆×E₆ + w₇×E₇ E₁ (Scale conformity): Distance from nearest scale degree pitchClass = (pc[i] + transpo - tonalCenter) mod 12 minDist = min_{d∈scale} |pitchClass - d| (circular) E₁ = average minDist over phrases E₂ (Voice leading): Smoothness between adjacent phrases E₂ = average |leap| where leaps >12 are penalized extra E₃ (Range constraint): Penalize total shifts >12 semitones E₃ = average (|shift| - 12)² for |shift| > 12 E₄ (Rhythmic stability): Variance of tm + duration preservation tmVar = variance of tm ratios durRatio = Σ(phraseDur[i] × tm[i]) / srcDur durPen = (durRatio - 1)² × 8 E₄ = tmVar + durPen E₅ (Dynamic coherence): Variance of dm + clipping penalties dmVar = variance of dm clipPen = Σ (dm[i] > 1.6 ? (dm-1.6)²×3 : 0) + Σ (dm[i] < 0.08 ? (0.08-dm)²×5 : 0) E₅ = dmVar + clipPen E₆ (Phrase integrity): Penalize phrases becoming too short newDur = phraseDur[i] × tm[i] if newDur < 0.25: E₆ += (0.25 - newDur)² × 8 / nPhrases E₇ (Contour naturalness): Penalize monotone contour if >75% of moves in same direction: E₇ = (monFrac - 0.75)² × 4

Proposal Types

🎲 8 Proposal Distributions

ProposalDescriptionEffect
Pitch NudgeRandom phrase ± random(1-2) semitonesLocal pitch adjustment
Dynamic SwellRandom phrase ± Gaussian(0,0.12) amplitudeLocal dynamic change
Micro RubatoRandom phrase time ratio change, compensate adjacentLocal timing variation
Phrase TransposeRandom phrase ± random(1-5) semitonesLarger local pitch shift
Temporal SwapSwap time ratios of adjacent phrasesRhythmic reordering
Global TransposeGlobal transpo ± random(1-3) semitonesOverall pitch shift
Tempo WarpMultiply all time ratios by Gaussian(1.0,0.06)Global tempo change
Dynamic ArchApply arch-shaped dynamic envelope (linear, inverted, or sinusoidal)Global dynamic shape

Major Scale Definition

Scale degrees in semitones from tonic: degree 1 = 0 degree 2 = 2 degree 3 = 4 degree 4 = 5 degree 5 = 7 degree 6 = 9 degree 7 = 11 Tonal center parameter (st): 0 = C, 2 = D, 4 = E, 5 = F, 7 = G, 9 = A, 11 = B

Rendering Pipeline per Phrase

For each phrase i: 1. Extract segment from source at phraseStart[i] to phraseEnd[i] 2. Apply tape speed pitch shift: pitchRatio = 2^( (pc[i] + transpo) / 12 ) Override SR × pitchRatio, then resample back to srcSr 3. Apply time stretch: lenFactor = timeRatio × pitchRatio Lengthen (overlap-add) by lenFactor 4. Apply dynamic scaling: Formula: self × dm[i] 5. Apply 5ms fade in/out (click prevention) 6. Concatenate with previous segments (crossfade 20ms)

Aesthetic Modes

Mode 2: Conservative (faithful)

🌱 Faithful Variations

Temperature: 0.8 (fixed) | Weights: E1=2.5, E2=2.0, E3=1.5, E4=2.0, E5=1.0, E6=1.2, E7=0.5

Proposal weights: PitchNudge=0.35, DynSwell=0.15, Rubato=0.25, PhrTrans=0.05, TmSwap=0.08, GlobTrans=0.03, TempoWarp=0.06, DynArch=0.03

Character: Stays close to original — small, local changes only, high scale/voice-leading weights

Use on: Material needing subtle variations, classical music, education

Mode 3: Expressive (reinterpretive)

🎵 Reinterpretive

Temperature: 3.0 → 1.0 (annealed) | Weights: E1=2.0, E2=1.5, E3=1.0, E4=1.5, E5=0.8, E6=1.2, E7=0.7

Proposal weights: PitchNudge=0.25, DynSwell=0.12, Rubato=0.20, PhrTrans=0.15, TmSwap=0.10, GlobTrans=0.08, TempoWarp=0.07, DynArch=0.03

Character: Balanced exploration — moderate changes, some phrase transposition, tempo warp

Use on: Expressive music, jazz, creative reinterpretations

Mode 4: Exploratory (creative)

🚀 Creative Exploration

Temperature: 8.0 → 2.0 (annealed) | Weights: E1=1.5, E2=1.0, E3=0.7, E4=1.0, E5=0.6, E6=0.8, E7=1.2

Proposal weights: PitchNudge=0.18, DynSwell=0.10, Rubato=0.15, PhrTrans=0.18, TmSwap=0.12, GlobTrans=0.12, TempoWarp=0.10, DynArch=0.05

Character: Wide exploration — large pitch shifts, frequent phrase transposition, temporal swaps, high contour variety weight encourages non-monotone lines

Use on: Experimental music, sound design, generative composition

Parameters & Controls

Aesthetic Mode

ParameterDefaultDescription
Aesthetic_modeExpressiveConservative, Expressive, Exploratory, or Custom

Chain Parameters

ParameterDefaultDescription
Mcmc_steps60Number of MCMC iterations
Thinning_interval6Render variation every N accepted steps
Start_temp2.0Initial temperature (higher = more exploration)
End_temp0.5Final temperature (lower = more exploitation)
Anneal1Gradually reduce temperature

Phrase Detection

ParameterDefaultDescription
Silence_thresh_dB-35.0Threshold for silence detection (dB)
Min_phrase_s0.5Minimum phrase duration (seconds)

Tonal Center

ParameterDefaultDescription
Tonal_center_st00=C, 2=D, 4=E, 5=F, 7=G, 9=A, 11=B

Output

ParameterDefaultDescription
Max_variations8Maximum number of variations to render
Keep_individual_variations0Keep separate Sound objects for each variation
Draw_visualization1Generate 6-panel analysis display
Play_first0Play first variation after generation

Custom Mode Parameters (hidden, editable in script)

ParameterDefault (Custom)Description
w1-w72.0,1.5,1.0,1.5,0.8,1.2,0.7Energy term weights
pw1-pw80.25,0.12,0.20,0.15,0.10,0.08,0.07,0.03Proposal probabilities (must sum to 1.0)

Visualization & Analysis

6-Panel Display

MCMC Musical Variation Visualization: Panel 1: TITLE • Script name, preset, source name, phrase count, step count, variation count, acceptance rate Panel 2: ORIGINAL WAVEFORM • X-axis: Time, Y-axis: Amplitude • Light gray background with zero line • Dotted vertical lines = phrase boundaries • Gray waveform = original • Title: "Input (dotted = phrase boundaries)" Panel 3: VARIATION 1 WAVEFORM • Same axes as original • Blue waveform = first rendered variation • Title: "Variation 1 (E=XXX)" Panel 4: ENERGY TRACE • X-axis: MCMC step (1 to nSteps) • Y-axis: Energy value • Light gray background • Green shading = accepted steps • Blue shading = render points (thinned steps) • Red line = energy trace • Dotted blue line = temperature (scaled to energy axis) • Title: "Energy trace (green=accepted blue=render red=energy dotted=T)" Panel 5: PITCH CONTOUR HEATMAP • X-axis: Phrase number • Y-axis: Variation number • Color-coded cells: blue = low pitch, red = high pitch • Shows how pitch contour evolves across variations • Title: "Pitch contour (blue=low red=high)" Panel 6: ENERGY PER VARIATION BAR CHART • X-axis: Variation number • Y-axis: Energy value • Color-coded bars (green to orange gradient) • Variation numbers labeled inside bars • Title: "Energy per rendered variation (lower = more musical)" Panel 7: STATS PANEL • Source info, mode, phrase count, temperature range • Step count, acceptance rate, variation count, thinning • Energy weights • Initial/final energy, delta, tonal center

Reading the Energy Trace

What the trace shows:
  • Green shaded steps: Accepted proposals — energy may go up or down
  • Blue shaded steps: Render points (thinned accepted steps) — where variations were generated
  • Red line: Energy value — should trend downward as chain finds better states
  • Dotted blue line: Temperature — high early (more exploration), lower later (more exploitation)
  • Sudden jumps up: Occasionally accepting worse states to escape local minima

Reading the Pitch Heatmap

What the heatmap tells you:
  • Rows: Each variation (1 at bottom to N at top)
  • Columns: Each phrase (1 to nPhrases)
  • Color: Blue = low pitch offset (e.g., -12 semitones), red = high pitch offset (e.g., +12)
  • Patterns: Look for consistent colors across variations (stable pitch choices) or varied colors (exploration)
  • Vertical stripes: Some phrases consistently high or low across variations
  • Horizontal gradients: Pitch contour changing across phrases within a variation

Applications

Generative Composition

Use case: Generating families of related variations from a source phrase

Technique: Exploratory mode with 8-12 phrases, 100+ steps, thin=10

Workflow:

Arranging & Orchestration

Use case: Creating multiple instrumental parts from a single melody

Technique: Expressive mode with moderate variation, keep individual variations

Applications:

Music Analysis & Pedagogy

Use case: Studying how variations relate to original material

Technique: Conservative mode, enable visualization, examine heatmap

Learning outcomes:

Sound Design

Use case: Creating evolving textures from short sounds

Technique: Exploratory mode with short phrases (min_phrase_s = 0.2-0.3)

Examples:

Practical Workflow Examples

🎵 String Quartet from Folk Tune

Goal: Create 4 variations of a folk melody for string quartet

Settings:

  • Source: 15-second folk melody (clearly phrased)
  • Mode: Expressive
  • Steps: 80, thin=10, max_var=4
  • Keep_individual_variations=1

Result: 4 variations — assign to Violin I, Violin II, Viola, Cello. Each has different pitch contours, timing, and dynamics while retaining phrase structure.

🎚️ Ambient Pad from Vocal Phrase

Goal: Generate 30-second ambient texture from 5-second vocal phrase

Settings:

  • Source: 5-second vocal phrase
  • Mode: Exploratory
  • Phrases: Silence detection (min_phrase_s=0.3) → ~15 phrases
  • Steps: 120, thin=8, max_var=12
  • Stereo mix output

Result: 12 variations layered in stereo mix, creating evolving ambient texture

🔬 Research: Variation Similarity

Goal: Study how energy weights affect variation distance

Settings:

  • Source: Standard test melody
  • Run multiple chains with different weight sets
  • Compute acoustic features of variations
  • Analyze relationship between energy and perceptual distance

Result: Insight into how musical constraints shape variation space

Troubleshooting Common Issues

Problem: No phrases detected (fallback to equal segments)
Cause: silence_thresh_dB too low or source has no silences
Solution: Adjust silence_thresh_dB (-30 to -40 dB) or accept fallback
Problem: All variations sound identical
Cause: Temperature too low, acceptance rate low, or weights too strict
Solution: Increase start_temp, reduce weights on E1/E2, check acceptance rate
Problem: Variations have clicks/pops
Cause: Crossfade too short or phrase boundaries not faded
Solution: Increase crossfade in rendering (currently 20ms), ensure fades applied
Problem: Rendering very slow
Cause: Many phrases × many variations × long phrases
Solution: Reduce max_variations, increase thinning_interval, reduce nPhrases (adjust min_phrase_s)
Problem: Energy trace flat or increasing
Cause: Temperature too high, acceptance rate too high, or weights misconfigured
Solution: Reduce start_temp, check proposal weights, adjust energy weights

Advanced Techniques

Custom energy weights:

In script, modify w1-w7 to emphasize different musical aspects. Higher w1 (scale) keeps variations tonal; higher w2 (voice leading) creates smoother connections; higher w7 (contour) avoids monotony.

Proposal weight tuning:

Adjust pw1-pw8 to control which transformations are tried more often. For example, increase pw4 (phrase transpose) for more dramatic pitch changes, increase pw3 (rubato) for more rhythmic variation.

Scale customization:

Modify the scale degrees array to use different scales (e.g., minor, pentatonic, whole-tone, octatonic).

Parallel tempering (advanced):

For better exploration, run multiple chains at different temperatures and swap states (not implemented).