MCMC Musical Variation — User Guide
Metropolis-Hastings MCMC chain for musical variation. State = pitch contour + time map + dynamic map per phrase. Proposals operate at phrase/note level. Energy enforces scale conformity, voice leading, rhythmic stability, range, dynamics, phrase integrity, and contour variety.
What this does
This script implements an MCMC (Markov Chain Monte Carlo) Musical Variation engine — a Metropolis-Hastings sampler that explores the space of musical variations on a source sound. The state consists of per-phrase pitch offsets, time stretch ratios, and dynamic scaling factors. Proposals modify these parameters, and an energy function evaluates musicality based on scale conformity, voice leading, rhythmic stability, range, dynamics, phrase integrity, and contour variety. Accepted states are rendered as audio variations.
🧮 What is MCMC for Music?
Markov Chain Monte Carlo is a statistical method for sampling from complex probability distributions. In this musical context:
- State: A set of parameters defining a variation (pitch offsets, time ratios, dynamic scales per phrase)
- Energy: A function that quantifies "musicality" — lower energy = more musical
- Temperature: Controls exploration vs. exploitation (high T = more exploration)
- Proposals: Random modifications to the current state
- Acceptance: Metropolis-Hastings criterion accepts better states sometimes, worse states occasionally (based on temperature)
The chain explores the space of variations, and accepted states are rendered as audio at regular intervals (thinning), producing a set of related but distinct musical variations.
Key Features:
- 3 Aesthetic Modes — Conservative, Expressive, Exploratory, plus Custom
- Phrase Detection — Silence-based segmentation into musical phrases
- 7 Energy Terms — Scale conformity, voice leading, range, rhythmic stability, dynamics, phrase integrity, contour variety
- 8 Proposal Types — Pitch nudge, dynamic swell, micro-rubato, phrase transpose, temporal swap, global transpose, tempo warp, dynamic arch
- Temperature Annealing — Gradually reduce temperature from start to end
- Thinned Rendering — Render variations at regular intervals
- Tonal Center Control — Specify tonal center in semitones (0=C, 2=D, 4=E, 5=F, 7=G, 9=A, 11=B)
- Comprehensive Visualization — 6-panel display with waveforms, energy trace, pitch heatmap, energy bars, stats
- Stereo Mix Output — All variations combined into a multi-channel stereo mix
Technical Implementation: (1) Phrase Detection: Silences → TextGrid → phrase intervals. (2) State Initialization: Zero pitch offsets, unity time/dynamic maps. (3) Energy Computation: Seven weighted terms. (4) MCMC Loop: Select proposal, compute energy, Metropolis-Hastings acceptance, update temperature. (5) Rendering: For accepted thinned steps, apply transformations per phrase (tape speed pitch shift, lengthen time stretch, dynamic scaling), concatenate with crossfades. (6) Visualization: 6-panel display with all data.
Quick start
- In Praat, select exactly one Sound object (minimum 1 second, any content).
- Run script… → select
MCMC_Musical_Variation.praat. - Choose Aesthetic_mode (2-4 for specific strategies, 1 for custom).
- Set chain parameters (steps, thinning, temperature start/end, annealing).
- Configure phrase detection (silence threshold, minimum phrase duration).
- Set tonal center in semitones (0=C, 2=D, 4=E, 5=F, 7=G, 9=A, 11=B).
- Set maximum variations and output options.
- Enable Draw_visualization for analysis display.
- Click OK — engine detects phrases, runs MCMC chain, renders variations, creates stereo mix "source_mcmc_stereo".
MCMC & Energy Theory
The Metropolis-Hastings Algorithm
State Representation
📊 State Variables per Phrase
For each phrase i (1..nPhrases):
- pc[i] — pitch contour offset (semitones, integer)
- tm[i] — time map ratio (>0, 1.0 = no change)
- dm[i] — dynamic map (amplitude scale, 1.0 = no change)
Global: transpo — global semitone shift (integer)
Total state dimension: 3 × nPhrases + 1
Energy Function Components
Proposal Types
🎲 8 Proposal Distributions
| Proposal | Description | Effect |
|---|---|---|
| Pitch Nudge | Random phrase ± random(1-2) semitones | Local pitch adjustment |
| Dynamic Swell | Random phrase ± Gaussian(0,0.12) amplitude | Local dynamic change |
| Micro Rubato | Random phrase time ratio change, compensate adjacent | Local timing variation |
| Phrase Transpose | Random phrase ± random(1-5) semitones | Larger local pitch shift |
| Temporal Swap | Swap time ratios of adjacent phrases | Rhythmic reordering |
| Global Transpose | Global transpo ± random(1-3) semitones | Overall pitch shift |
| Tempo Warp | Multiply all time ratios by Gaussian(1.0,0.06) | Global tempo change |
| Dynamic Arch | Apply arch-shaped dynamic envelope (linear, inverted, or sinusoidal) | Global dynamic shape |
Major Scale Definition
Rendering Pipeline per Phrase
Aesthetic Modes
Mode 2: Conservative (faithful)
🌱 Faithful Variations
Temperature: 0.8 (fixed) | Weights: E1=2.5, E2=2.0, E3=1.5, E4=2.0, E5=1.0, E6=1.2, E7=0.5
Proposal weights: PitchNudge=0.35, DynSwell=0.15, Rubato=0.25, PhrTrans=0.05, TmSwap=0.08, GlobTrans=0.03, TempoWarp=0.06, DynArch=0.03
Character: Stays close to original — small, local changes only, high scale/voice-leading weights
Use on: Material needing subtle variations, classical music, education
Mode 3: Expressive (reinterpretive)
🎵 Reinterpretive
Temperature: 3.0 → 1.0 (annealed) | Weights: E1=2.0, E2=1.5, E3=1.0, E4=1.5, E5=0.8, E6=1.2, E7=0.7
Proposal weights: PitchNudge=0.25, DynSwell=0.12, Rubato=0.20, PhrTrans=0.15, TmSwap=0.10, GlobTrans=0.08, TempoWarp=0.07, DynArch=0.03
Character: Balanced exploration — moderate changes, some phrase transposition, tempo warp
Use on: Expressive music, jazz, creative reinterpretations
Mode 4: Exploratory (creative)
🚀 Creative Exploration
Temperature: 8.0 → 2.0 (annealed) | Weights: E1=1.5, E2=1.0, E3=0.7, E4=1.0, E5=0.6, E6=0.8, E7=1.2
Proposal weights: PitchNudge=0.18, DynSwell=0.10, Rubato=0.15, PhrTrans=0.18, TmSwap=0.12, GlobTrans=0.12, TempoWarp=0.10, DynArch=0.05
Character: Wide exploration — large pitch shifts, frequent phrase transposition, temporal swaps, high contour variety weight encourages non-monotone lines
Use on: Experimental music, sound design, generative composition
Parameters & Controls
Aesthetic Mode
| Parameter | Default | Description |
|---|---|---|
| Aesthetic_mode | Expressive | Conservative, Expressive, Exploratory, or Custom |
Chain Parameters
| Parameter | Default | Description |
|---|---|---|
| Mcmc_steps | 60 | Number of MCMC iterations |
| Thinning_interval | 6 | Render variation every N accepted steps |
| Start_temp | 2.0 | Initial temperature (higher = more exploration) |
| End_temp | 0.5 | Final temperature (lower = more exploitation) |
| Anneal | 1 | Gradually reduce temperature |
Phrase Detection
| Parameter | Default | Description |
|---|---|---|
| Silence_thresh_dB | -35.0 | Threshold for silence detection (dB) |
| Min_phrase_s | 0.5 | Minimum phrase duration (seconds) |
Tonal Center
| Parameter | Default | Description |
|---|---|---|
| Tonal_center_st | 0 | 0=C, 2=D, 4=E, 5=F, 7=G, 9=A, 11=B |
Output
| Parameter | Default | Description |
|---|---|---|
| Max_variations | 8 | Maximum number of variations to render |
| Keep_individual_variations | 0 | Keep separate Sound objects for each variation |
| Draw_visualization | 1 | Generate 6-panel analysis display |
| Play_first | 0 | Play first variation after generation |
Custom Mode Parameters (hidden, editable in script)
| Parameter | Default (Custom) | Description |
|---|---|---|
| w1-w7 | 2.0,1.5,1.0,1.5,0.8,1.2,0.7 | Energy term weights |
| pw1-pw8 | 0.25,0.12,0.20,0.15,0.10,0.08,0.07,0.03 | Proposal probabilities (must sum to 1.0) |
Visualization & Analysis
6-Panel Display
Reading the Energy Trace
- Green shaded steps: Accepted proposals — energy may go up or down
- Blue shaded steps: Render points (thinned accepted steps) — where variations were generated
- Red line: Energy value — should trend downward as chain finds better states
- Dotted blue line: Temperature — high early (more exploration), lower later (more exploitation)
- Sudden jumps up: Occasionally accepting worse states to escape local minima
Reading the Pitch Heatmap
- Rows: Each variation (1 at bottom to N at top)
- Columns: Each phrase (1 to nPhrases)
- Color: Blue = low pitch offset (e.g., -12 semitones), red = high pitch offset (e.g., +12)
- Patterns: Look for consistent colors across variations (stable pitch choices) or varied colors (exploration)
- Vertical stripes: Some phrases consistently high or low across variations
- Horizontal gradients: Pitch contour changing across phrases within a variation
Applications
Generative Composition
Use case: Generating families of related variations from a source phrase
Technique: Exploratory mode with 8-12 phrases, 100+ steps, thin=10
Workflow:
- Record a 20-30 second melodic phrase with clear phrasing
- Adjust silence threshold to get 8-12 phrases
- Run with Exploratory mode, max_variations=12
- Listen to the stereo mix — variations evolve from near-original to adventurous
- Select favorite variations for further processing
Arranging & Orchestration
Use case: Creating multiple instrumental parts from a single melody
Technique: Expressive mode with moderate variation, keep individual variations
Applications:
- String quartet: Generate 4 variations, assign to different instruments
- Choral writing: Create SATB parts from a cantus firmus
- Electronic layering: Layer variations for rich textures
Music Analysis & Pedagogy
Use case: Studying how variations relate to original material
Technique: Conservative mode, enable visualization, examine heatmap
Learning outcomes:
- See how pitch contour is preserved or transformed
- Understand the role of energy weights in shaping variations
- Observe MCMC convergence behavior
- Relate temperature to exploration vs. exploitation
Sound Design
Use case: Creating evolving textures from short sounds
Technique: Exploratory mode with short phrases (min_phrase_s = 0.2-0.3)
Examples:
- Granular textures: Short phrases become grains, variations create evolving clouds
- Rhythmic variations: Tempo warp and micro rubato create complex rhythms
- Dynamic layers: Dynamic arch proposals create evolving amplitude shapes
Practical Workflow Examples
🎵 String Quartet from Folk Tune
Goal: Create 4 variations of a folk melody for string quartet
Settings:
- Source: 15-second folk melody (clearly phrased)
- Mode: Expressive
- Steps: 80, thin=10, max_var=4
- Keep_individual_variations=1
Result: 4 variations — assign to Violin I, Violin II, Viola, Cello. Each has different pitch contours, timing, and dynamics while retaining phrase structure.
🎚️ Ambient Pad from Vocal Phrase
Goal: Generate 30-second ambient texture from 5-second vocal phrase
Settings:
- Source: 5-second vocal phrase
- Mode: Exploratory
- Phrases: Silence detection (min_phrase_s=0.3) → ~15 phrases
- Steps: 120, thin=8, max_var=12
- Stereo mix output
Result: 12 variations layered in stereo mix, creating evolving ambient texture
🔬 Research: Variation Similarity
Goal: Study how energy weights affect variation distance
Settings:
- Source: Standard test melody
- Run multiple chains with different weight sets
- Compute acoustic features of variations
- Analyze relationship between energy and perceptual distance
Result: Insight into how musical constraints shape variation space
Troubleshooting Common Issues
Cause: silence_thresh_dB too low or source has no silences
Solution: Adjust silence_thresh_dB (-30 to -40 dB) or accept fallback
Cause: Temperature too low, acceptance rate low, or weights too strict
Solution: Increase start_temp, reduce weights on E1/E2, check acceptance rate
Cause: Crossfade too short or phrase boundaries not faded
Solution: Increase crossfade in rendering (currently 20ms), ensure fades applied
Cause: Many phrases × many variations × long phrases
Solution: Reduce max_variations, increase thinning_interval, reduce nPhrases (adjust min_phrase_s)
Cause: Temperature too high, acceptance rate too high, or weights misconfigured
Solution: Reduce start_temp, check proposal weights, adjust energy weights
Advanced Techniques
In script, modify w1-w7 to emphasize different musical aspects. Higher w1 (scale) keeps variations tonal; higher w2 (voice leading) creates smoother connections; higher w7 (contour) avoids monotony.
Adjust pw1-pw8 to control which transformations are tried more often. For example, increase pw4 (phrase transpose) for more dramatic pitch changes, increase pw3 (rubato) for more rhythmic variation.
Modify the scale degrees array to use different scales (e.g., minor, pentatonic, whole-tone, octatonic).
For better exploration, run multiple chains at different temperatures and swap states (not implemented).