MCMC Musical Variation — User Guide

Metropolis-Hastings MCMC chain for musical variation. State = pitch contour + time map + dynamic map per phrase. Proposals operate at phrase/note level. Energy enforces scale conformity, voice leading, rhythmic stability, range, dynamics, phrase integrity, and contour variety.

Author: Shai Cohen Version: 1.0 (2025) Technique: Metropolis-Hastings MCMC Category: Composition / MCMC Citation: Cohen, S. (2025). Praat AudioTools

Contents:

What this does Quick start MCMC & Energy Theory Aesthetic Modes Parameters & Controls Visualization & Analysis Applications

What this does

This script implements an MCMC (Markov Chain Monte Carlo) Musical Variation engine — a Metropolis-Hastings sampler that explores the space of musical variations on a source sound. The state consists of per-phrase pitch offsets, time stretch ratios, and dynamic scaling factors. Proposals modify these parameters, and an energy function evaluates musicality based on scale conformity, voice leading, rhythmic stability, range, dynamics, phrase integrity, and contour variety. Accepted states are rendered as audio variations.

🧮 What is MCMC for Music?

Markov Chain Monte Carlo is a statistical method for sampling from complex probability distributions. In this musical context:

State: A set of parameters defining a variation (pitch offsets, time ratios, dynamic scales per phrase)
Energy: A function that quantifies "musicality" — lower energy = more musical
Temperature: Controls exploration vs. exploitation (high T = more exploration)
Proposals: Random modifications to the current state
Acceptance: Metropolis-Hastings criterion accepts better states sometimes, worse states occasionally (based on temperature)

The chain explores the space of variations, and accepted states are rendered as audio at regular intervals (thinning), producing a set of related but distinct musical variations.

Key Features:

3 Aesthetic Modes — Conservative, Expressive, Exploratory, plus Custom
Phrase Detection — Silence-based segmentation into musical phrases
7 Energy Terms — Scale conformity, voice leading, range, rhythmic stability, dynamics, phrase integrity, contour variety
8 Proposal Types — Pitch nudge, dynamic swell, micro-rubato, phrase transpose, temporal swap, global transpose, tempo warp, dynamic arch
Temperature Annealing — Gradually reduce temperature from start to end
Thinned Rendering — Render variations at regular intervals
Tonal Center Control — Specify tonal center in semitones (0=C, 2=D, 4=E, 5=F, 7=G, 9=A, 11=B)
Comprehensive Visualization — 6-panel display with waveforms, energy trace, pitch heatmap, energy bars, stats
Stereo Mix Output — All variations combined into a multi-channel stereo mix

Technical Implementation: (1) Phrase Detection: Silences → TextGrid → phrase intervals. (2) State Initialization: Zero pitch offsets, unity time/dynamic maps. (3) Energy Computation: Seven weighted terms. (4) MCMC Loop: Select proposal, compute energy, Metropolis-Hastings acceptance, update temperature. (5) Rendering: For accepted thinned steps, apply transformations per phrase (tape speed pitch shift, lengthen time stretch, dynamic scaling), concatenate with crossfades. (6) Visualization: 6-panel display with all data.

Quick start

In Praat, select exactly one Sound object (minimum 1 second, any content).
Run script… → select MCMC_Musical_Variation.praat.
Choose Aesthetic_mode (2-4 for specific strategies, 1 for custom).
Set chain parameters (steps, thinning, temperature start/end, annealing).
Configure phrase detection (silence threshold, minimum phrase duration).
Set tonal center in semitones (0=C, 2=D, 4=E, 5=F, 7=G, 9=A, 11=B).
Set maximum variations and output options.
Enable Draw_visualization for analysis display.
Click OK — engine detects phrases, runs MCMC chain, renders variations, creates stereo mix "source_mcmc_stereo".

Quick tip: Start with Expressive mode on a 10-20 second melodic phrase. Enable visualization — you'll see the original waveform with phrase boundaries (dotted), the first variation, energy trace (red) with temperature overlay (dotted blue), pitch contour heatmap, and energy per variation. Listen to how the variations explore the space around the original, becoming more adventurous as the chain progresses. The output appears as "source_mcmc_stereo" (multi-channel mix) and individual variations if keep_individual_variations=1.

Important: PHRASE DETECTION is critical — adjust silence_thresh_dB and min_phrase_s for your material. If detection fails, script falls back to 4 equal segments. ENERGY WEIGHTS in custom mode require careful tuning — see theory section. PROPOSAL WEIGHTS determine how often each proposal type is used (must sum to 1.0). RENDERING TIME scales with steps × phrases × thin interval. For 60 steps × thin=6 × 8 phrases ≈ 10-20 seconds rendering. STEREO MIX combines all variations into a multi-channel mix (each variation in its own channel, then mixed to stereo).

MCMC & Energy Theory

The Metropolis-Hastings Algorithm

For each step t with current state θ and energy E(θ): 1. Propose new state θ' using proposal distribution q(θ'|θ) 2. Compute energy E(θ') 3. Compute acceptance probability: α = min(1, exp(-(E(θ') - E(θ)) / T)) where T is current temperature 4. Accept θ' with probability α; otherwise keep θ Temperature schedule (if annealing): T(t) = T_start + (T_end - T_start) × (t-1)/(steps-1)

State Representation

📊 State Variables per Phrase

For each phrase i (1..nPhrases):

pc[i] — pitch contour offset (semitones, integer)
tm[i] — time map ratio (>0, 1.0 = no change)
dm[i] — dynamic map (amplitude scale, 1.0 = no change)

Global: transpo — global semitone shift (integer)

Total state dimension: 3 × nPhrases + 1

Energy Function Components

E = w₁×E₁ + w₂×E₂ + w₃×E₃ + w₄×E₄ + w₅×E₅ + w₆×E₆ + w₇×E₇ E₁ (Scale conformity): Distance from nearest scale degree pitchClass = (pc[i] + transpo - tonalCenter) mod 12 minDist = min_{d∈scale} |pitchClass - d| (circular) E₁ = average minDist over phrases E₂ (Voice leading): Smoothness between adjacent phrases E₂ = average |leap| where leaps >12 are penalized extra E₃ (Range constraint): Penalize total shifts >12 semitones E₃ = average (|shift| - 12)² for |shift| > 12 E₄ (Rhythmic stability): Variance of tm + duration preservation tmVar = variance of tm ratios durRatio = Σ(phraseDur[i] × tm[i]) / srcDur durPen = (durRatio - 1)² × 8 E₄ = tmVar + durPen E₅ (Dynamic coherence): Variance of dm + clipping penalties dmVar = variance of dm clipPen = Σ (dm[i] > 1.6 ? (dm-1.6)²×3 : 0) + Σ (dm[i] < 0.08 ? (0.08-dm)²×5 : 0) E₅ = dmVar + clipPen E₆ (Phrase integrity): Penalize phrases becoming too short newDur = phraseDur[i] × tm[i] if newDur < 0.25: E₆ += (0.25 - newDur)² × 8 / nPhrases E₇ (Contour naturalness): Penalize monotone contour if >75% of moves in same direction: E₇ = (monFrac - 0.75)² × 4

Proposal Types

🎲 8 Proposal Distributions

Proposal	Description	Effect
Pitch Nudge	Random phrase ± random(1-2) semitones	Local pitch adjustment
Dynamic Swell	Random phrase ± Gaussian(0,0.12) amplitude	Local dynamic change
Micro Rubato	Random phrase time ratio change, compensate adjacent	Local timing variation
Phrase Transpose	Random phrase ± random(1-5) semitones	Larger local pitch shift
Temporal Swap	Swap time ratios of adjacent phrases	Rhythmic reordering
Global Transpose	Global transpo ± random(1-3) semitones	Overall pitch shift
Tempo Warp	Multiply all time ratios by Gaussian(1.0,0.06)	Global tempo change
Dynamic Arch	Apply arch-shaped dynamic envelope (linear, inverted, or sinusoidal)	Global dynamic shape

Major Scale Definition

Scale degrees in semitones from tonic: degree 1 = 0 degree 2 = 2 degree 3 = 4 degree 4 = 5 degree 5 = 7 degree 6 = 9 degree 7 = 11 Tonal center parameter (st): 0 = C, 2 = D, 4 = E, 5 = F, 7 = G, 9 = A, 11 = B

Rendering Pipeline per Phrase

For each phrase i: 1. Extract segment from source at phraseStart[i] to phraseEnd[i] 2. Apply tape speed pitch shift: pitchRatio = 2^( (pc[i] + transpo) / 12 ) Override SR × pitchRatio, then resample back to srcSr 3. Apply time stretch: lenFactor = timeRatio × pitchRatio Lengthen (overlap-add) by lenFactor 4. Apply dynamic scaling: Formula: self × dm[i] 5. Apply 5ms fade in/out (click prevention) 6. Concatenate with previous segments (crossfade 20ms)

Aesthetic Modes

Mode 2: Conservative (faithful)

🌱 Faithful Variations

Temperature: 0.8 (fixed) | Weights: E1=2.5, E2=2.0, E3=1.5, E4=2.0, E5=1.0, E6=1.2, E7=0.5

Proposal weights: PitchNudge=0.35, DynSwell=0.15, Rubato=0.25, PhrTrans=0.05, TmSwap=0.08, GlobTrans=0.03, TempoWarp=0.06, DynArch=0.03

Character: Stays close to original — small, local changes only, high scale/voice-leading weights

Use on: Material needing subtle variations, classical music, education

Mode 3: Expressive (reinterpretive)

🎵 Reinterpretive

Temperature: 3.0 → 1.0 (annealed) | Weights: E1=2.0, E2=1.5, E3=1.0, E4=1.5, E5=0.8, E6=1.2, E7=0.7

Proposal weights: PitchNudge=0.25, DynSwell=0.12, Rubato=0.20, PhrTrans=0.15, TmSwap=0.10, GlobTrans=0.08, TempoWarp=0.07, DynArch=0.03

Character: Balanced exploration — moderate changes, some phrase transposition, tempo warp

Use on: Expressive music, jazz, creative reinterpretations

Mode 4: Exploratory (creative)

🚀 Creative Exploration

Temperature: 8.0 → 2.0 (annealed) | Weights: E1=1.5, E2=1.0, E3=0.7, E4=1.0, E5=0.6, E6=0.8, E7=1.2

Proposal weights: PitchNudge=0.18, DynSwell=0.10, Rubato=0.15, PhrTrans=0.18, TmSwap=0.12, GlobTrans=0.12, TempoWarp=0.10, DynArch=0.05

Character: Wide exploration — large pitch shifts, frequent phrase transposition, temporal swaps, high contour variety weight encourages non-monotone lines

Use on: Experimental music, sound design, generative composition

Parameters & Controls

Aesthetic Mode

Parameter	Default	Description
Aesthetic_mode	Expressive	Conservative, Expressive, Exploratory, or Custom

Chain Parameters

Parameter	Default	Description
Mcmc_steps	60	Number of MCMC iterations
Thinning_interval	6	Render variation every N accepted steps
Start_temp	2.0	Initial temperature (higher = more exploration)
End_temp	0.5	Final temperature (lower = more exploitation)
Anneal	1	Gradually reduce temperature

Phrase Detection

Parameter	Default	Description
Silence_thresh_dB	-35.0	Threshold for silence detection (dB)
Min_phrase_s	0.5	Minimum phrase duration (seconds)

Tonal Center

Parameter	Default	Description
Tonal_center_st	0	0=C, 2=D, 4=E, 5=F, 7=G, 9=A, 11=B

Output

Parameter	Default	Description
Max_variations	8	Maximum number of variations to render
Keep_individual_variations	0	Keep separate Sound objects for each variation
Draw_visualization	1	Generate 6-panel analysis display
Play_first	0	Play first variation after generation

Custom Mode Parameters (hidden, editable in script)

Parameter	Default (Custom)	Description
w1-w7	2.0,1.5,1.0,1.5,0.8,1.2,0.7	Energy term weights
pw1-pw8	0.25,0.12,0.20,0.15,0.10,0.08,0.07,0.03	Proposal probabilities (must sum to 1.0)

Visualization & Analysis

6-Panel Display

MCMC Musical Variation Visualization: Panel 1: TITLE • Script name, preset, source name, phrase count, step count, variation count, acceptance rate Panel 2: ORIGINAL WAVEFORM • X-axis: Time, Y-axis: Amplitude • Light gray background with zero line • Dotted vertical lines = phrase boundaries • Gray waveform = original • Title: "Input (dotted = phrase boundaries)" Panel 3: VARIATION 1 WAVEFORM • Same axes as original • Blue waveform = first rendered variation • Title: "Variation 1 (E=XXX)" Panel 4: ENERGY TRACE • X-axis: MCMC step (1 to nSteps) • Y-axis: Energy value • Light gray background • Green shading = accepted steps • Blue shading = render points (thinned steps) • Red line = energy trace • Dotted blue line = temperature (scaled to energy axis) • Title: "Energy trace (green=accepted blue=render red=energy dotted=T)" Panel 5: PITCH CONTOUR HEATMAP • X-axis: Phrase number • Y-axis: Variation number • Color-coded cells: blue = low pitch, red = high pitch • Shows how pitch contour evolves across variations • Title: "Pitch contour (blue=low red=high)" Panel 6: ENERGY PER VARIATION BAR CHART • X-axis: Variation number • Y-axis: Energy value • Color-coded bars (green to orange gradient) • Variation numbers labeled inside bars • Title: "Energy per rendered variation (lower = more musical)" Panel 7: STATS PANEL • Source info, mode, phrase count, temperature range • Step count, acceptance rate, variation count, thinning • Energy weights • Initial/final energy, delta, tonal center

Reading the Energy Trace

What the trace shows:

Green shaded steps: Accepted proposals — energy may go up or down
Blue shaded steps: Render points (thinned accepted steps) — where variations were generated
Red line: Energy value — should trend downward as chain finds better states
Dotted blue line: Temperature — high early (more exploration), lower later (more exploitation)
Sudden jumps up: Occasionally accepting worse states to escape local minima

Reading the Pitch Heatmap

What the heatmap tells you:

Rows: Each variation (1 at bottom to N at top)
Columns: Each phrase (1 to nPhrases)
Color: Blue = low pitch offset (e.g., -12 semitones), red = high pitch offset (e.g., +12)
Patterns: Look for consistent colors across variations (stable pitch choices) or varied colors (exploration)
Vertical stripes: Some phrases consistently high or low across variations
Horizontal gradients: Pitch contour changing across phrases within a variation

Applications

Generative Composition

Use case: Generating families of related variations from a source phrase

Technique: Exploratory mode with 8-12 phrases, 100+ steps, thin=10

Workflow:

Record a 20-30 second melodic phrase with clear phrasing
Adjust silence threshold to get 8-12 phrases
Run with Exploratory mode, max_variations=12
Listen to the stereo mix — variations evolve from near-original to adventurous
Select favorite variations for further processing

Arranging & Orchestration

Use case: Creating multiple instrumental parts from a single melody

Technique: Expressive mode with moderate variation, keep individual variations

Applications:

String quartet: Generate 4 variations, assign to different instruments
Choral writing: Create SATB parts from a cantus firmus
Electronic layering: Layer variations for rich textures

Music Analysis & Pedagogy

Use case: Studying how variations relate to original material

Technique: Conservative mode, enable visualization, examine heatmap

Learning outcomes:

See how pitch contour is preserved or transformed
Understand the role of energy weights in shaping variations
Observe MCMC convergence behavior
Relate temperature to exploration vs. exploitation

Sound Design

Use case: Creating evolving textures from short sounds

Technique: Exploratory mode with short phrases (min_phrase_s = 0.2-0.3)

Examples:

Granular textures: Short phrases become grains, variations create evolving clouds
Rhythmic variations: Tempo warp and micro rubato create complex rhythms
Dynamic layers: Dynamic arch proposals create evolving amplitude shapes

Practical Workflow Examples

🎵 String Quartet from Folk Tune

Goal: Create 4 variations of a folk melody for string quartet

Settings:

Source: 15-second folk melody (clearly phrased)
Mode: Expressive
Steps: 80, thin=10, max_var=4
Keep_individual_variations=1

Result: 4 variations — assign to Violin I, Violin II, Viola, Cello. Each has different pitch contours, timing, and dynamics while retaining phrase structure.

🎚️ Ambient Pad from Vocal Phrase

Goal: Generate 30-second ambient texture from 5-second vocal phrase

Settings:

Source: 5-second vocal phrase
Mode: Exploratory
Phrases: Silence detection (min_phrase_s=0.3) → ~15 phrases
Steps: 120, thin=8, max_var=12
Stereo mix output

Result: 12 variations layered in stereo mix, creating evolving ambient texture

🔬 Research: Variation Similarity

Goal: Study how energy weights affect variation distance

Settings:

Source: Standard test melody
Run multiple chains with different weight sets
Compute acoustic features of variations
Analyze relationship between energy and perceptual distance

Result: Insight into how musical constraints shape variation space

Troubleshooting Common Issues

Problem: No phrases detected (fallback to equal segments)
Cause: silence_thresh_dB too low or source has no silences
Solution: Adjust silence_thresh_dB (-30 to -40 dB) or accept fallback

Problem: All variations sound identical
Cause: Temperature too low, acceptance rate low, or weights too strict
Solution: Increase start_temp, reduce weights on E1/E2, check acceptance rate

Problem: Variations have clicks/pops
Cause: Crossfade too short or phrase boundaries not faded
Solution: Increase crossfade in rendering (currently 20ms), ensure fades applied

Problem: Rendering very slow
Cause: Many phrases × many variations × long phrases
Solution: Reduce max_variations, increase thinning_interval, reduce nPhrases (adjust min_phrase_s)

Problem: Energy trace flat or increasing
Cause: Temperature too high, acceptance rate too high, or weights misconfigured
Solution: Reduce start_temp, check proposal weights, adjust energy weights

Advanced Techniques

Custom energy weights:

In script, modify w1-w7 to emphasize different musical aspects. Higher w1 (scale) keeps variations tonal; higher w2 (voice leading) creates smoother connections; higher w7 (contour) avoids monotony.

Proposal weight tuning:

Adjust pw1-pw8 to control which transformations are tried more often. For example, increase pw4 (phrase transpose) for more dramatic pitch changes, increase pw3 (rubato) for more rhythmic variation.

Scale customization:

Modify the scale degrees array to use different scales (e.g., minor, pentatonic, whole-tone, octatonic).

Parallel tempering (advanced):

For better exploration, run multiple chains at different temperatures and swap states (not implemented).