Latent Barycentric Mutation — User Guide
Trains a VAE on-the-fly from event-level audio patches, then navigates the latent space according to a navigation plan. At each step, K nearest-neighbor events are found and their waveforms are mixed with barycentric (inverse-distance) weights.
What this does
This script implements a Latent Barycentric Mutation engine — a system that trains a VAE on-the-fly from event-level audio patches, then navigates the latent space according to a navigation plan. At each step, K nearest-neighbor events are found and their waveforms are mixed with barycentric (inverse-distance) weights, creating seamless morphs between acoustic identities.
🧠 What is Barycentric Mutation?
This approach combines several advanced concepts:
- VAE encoding: Each event is encoded to a latent mean vector μ via a Variational Autoencoder
- Navigation plan: A sequence of steps with modes (drift, mutate, return, settle) that determine how to move through latent space
- K-nearest neighbors: At each step, find the K events closest to the current latent position
- Barycentric mixing: Mix the K event waveforms with weights inversely proportional to distance
- Pitch preservation: Optional phase-vocoder time-stretching preserves pitch when durations differ
The result is a continuous morph through the acoustic identity space of the source.
Key Features:
- 4 Preset Strategies — Gentle Drift to Slow Settle, plus Custom
- 2 Plan Sources — External CSV or Auto-generate with 4 modes
- VAE Encoding — Pure numpy VAE with encoder, decoder, and KL loss
- 4 Navigation Modes — drift (coherent steps), mutate (exploratory), return (pull to anchor), settle (converge)
- Barycentric Mixing — Inverse-distance weighted blend of K nearest events
- 3 Pitch Preservation Modes — off (resample), preserve_f0 (pad/trim), preserve_spectral_envelope (phase vocoder)
- 4 Normalization Modes — none, peak, RMS, loudness (RMS proxy)
- Comprehensive Visualization — 6-panel display with waveforms, spectrograms, navigation phase panel, summary
Technical Implementation: (1) Event Segmentation: Praat segments audio into events. (2) Mel Patches: 40×32 log-mel patches per event. (3) VAE: Train on-the-fly, encode to latent means μ. (4) Navigation Plan: Load from CSV or auto-generate with 4 modes. (5) Execution: For each step, find K nearest events, compute barycentric weights, mix, apply duration/energy scaling, crossfade. (6) Pitch Preservation: Phase vocoder or pad/trim as needed. (7) Normalization: Scale to match input RMS or peak. (8) Visualization.
Quick start
- In Praat, select exactly one Sound object (any duration, any content).
- Run script… → select
LatentBarycentric.praat. - Choose Preset (2-5 for specific strategies, 1 for custom).
- Set latent size and duration (0 = original).
- Choose Plan source (External CSV or Auto-generate).
- If auto-generate, set generator controls (steps, mode, step size, temperature, K, return strength, etc.).
- Select normalization mode and pitch preservation mode.
- Enable Draw_visualization for analysis display.
- Click OK — engine segments, trains VAE, executes navigation plan, reconstructs.
Latent Barycentric Theory
VAE Architecture
Barycentric Mixing
Navigation Modes
Pitch Preservation
Normalization Modes
Preset Strategies
Preset 2: Gentle Drift
🌊 Smooth, Coherent
Latent: 6 | Plan: auto (defaults) | Pitch: off
Character: Smooth drift through latent space — gentle, coherent evolution
Use on: Ambient, gradual transformations
Preset 3: Full Mutation Arc
🌀 Exploratory Arc
Latent: 10 | Plan: auto (defaults) | Pitch: off
Character: Full cycle through all modes — drift → mutate → return → settle
Use on: Complete narrative arcs
Preset 4: Return Focus
🎯 Anchor-Focused
Latent: 8 | Plan: auto (defaults) | Pitch: off
Character: Emphasis on return mode — pulls toward anchor positions
Use on: Creating tension and resolution
Preset 5: Slow Settle
🧘 Convergent
Latent: 6 | Plan: auto (defaults) | Pitch: off
Character: Gradual settling into stable region
Use on: Resolution, conclusion
Parameters & Controls
Core Settings
| Parameter | Default | Description |
|---|---|---|
| Latent_size | 8 | VAE latent dimensions (2–32) |
| Duration (0 = original) | 0 | Target output duration (seconds) |
| Seed | 42 | Random seed for reproducibility |
Plan Generator
| Parameter | Default | Description |
|---|---|---|
| Plan_source | External CSV | External CSV or Auto-generate |
| Plan_mode_preset | cycle | drift, mutate, return, cycle |
| Plan_steps | 60 | Number of navigation steps (4–500) |
| Plan_step_size | 0.35 | Movement step size (0–1) |
| Plan_temperature | 0.40 | Noise level (0–1) |
| Plan_k_neighbors | 4 | Number of neighbors for barycentric mix (2–8) |
| Plan_return_strength | 0.65 | Strength of return pull (0–1) |
| Plan_anchor_strategy | center | center, step0, last, periodic |
| Plan_anchor_period | 15 | Steps between anchors (periodic mode) |
| Plan_dur_scale | 1.0 | Duration multiplier per step (0.25–4.0) |
| Plan_dur_jitter | 0.0 | Random variation in duration (±) |
| Plan_eng_scale | 1.0 | Energy multiplier per step (0.1–3.0) |
| Plan_eng_jitter | 0.0 | Random variation in energy (±) |
Output Level
| Parameter | Default | Description |
|---|---|---|
| Normalize_mode | rms | none, peak, rms, loudness |
Pitch Preservation
| Parameter | Default | Description |
|---|---|---|
| Pitch_mode | off | off, preserve_f0, preserve_spectral_envelope |
Output
| Parameter | Default | Description |
|---|---|---|
| Draw_visualization | 1 | Generate 6-panel analysis display |
| Play_result | 1 | Audition after processing |
Visualization & Analysis
6-Panel Display
Reading the Navigation Phase Panel
- Blue (drift): Smooth, coherent movement with inertia — gradual evolution
- Red (mutate): Exploratory jumps with high temperature — sudden changes
- Green (return): Pull toward anchor positions — creates tension and resolution
- Yellow (settle): Cool-down, convergence — stabilization
- The step counts show how many steps in each mode (useful for understanding the plan)
Interpreting Output Spectrogram
- Smooth transitions: The barycentric mixing should create seamless morphs between events
- Pitch preservation: With preserve_f0 or preserve_spectral_envelope, pitch should remain stable even as timbre evolves
- Mode changes: In mutate mode, you may hear more abrupt changes; in drift, smoother evolution
Applications
Electroacoustic Composition
Use case: Creating continuous morphs between acoustic identities
Technique: Full Mutation Arc preset with auto-generate cycle
Workflow:
- Select a 20-60 second recording with multiple acoustic states
- Run with Full Mutation Arc preset
- Listen to how the sound drifts, mutates, returns, and settles
- Export and use as narrative movement in larger work
Sound Design for Media
Use case: Creating evolving textures, transitions, risers
Technique: Gentle Drift with preserve_spectral_envelope
Applications:
- Ambient beds: Gentle Drift with low temperature
- Tension cues: Return Focus with strong return pull
- Transitions: Full Mutation Arc creates complete arcs
Music Production
Use case: Creating evolving pads, generative textures
Technique: Custom plan with specific mode distribution
Examples:
- Pad evolution: 80% drift, 20% mutate for variety
- Build-up: Increasing mutate proportion toward climax
- Resolution: Return mode with strong pull to anchor
Research & Education
Use case: Studying VAE latent spaces, barycentric interpolation, navigation strategies
Technique: Compare plans on same source, examine phase panel
Learning outcomes:
- Understand how VAE latent space organizes acoustic events
- See how different navigation modes affect output
- Explore barycentric mixing and its perceptual effects
- Learn about phase vocoder for pitch-preserving time-stretch
Practical Workflow Examples
Troubleshooting Common Issues
Cause: Python not installed, or packages missing
Solution: Install Python and required packages: pip install numpy soundfile scipy
Cause: Plan source set to External but file missing
Solution: Switch to Auto-generate, or provide valid CSV
Cause: Extreme time-stretching ratios
Solution: Use preserve_f0 mode instead, or reduce dur_scale range
Cause: Crossfade insufficient at splice points
Solution: Increase XFADE_SEC in Python script (currently 12ms)
Cause: step_size too small, or temperature too low
Solution: Increase step_size, increase temperature, check plan modes
Advanced Techniques
Generate navigation plans from other AI tools (e.g., latent navigation AI) and load via External CSV. Schema must match expected format.
In _phase_vocoder_stretch(), modify n_fft, hop_out, and OLA floor to adjust quality/performance trade-off.
Experiment with different anchor_strategy values in auto-generate to control return behavior.
Script extracts mono for analysis but preserves stereo through mixing (delayed Haas effect). For true stereo, input stereo and output will be stereo.