The Latent Counterpoint β User Guide
Trains an autoencoder on-the-fly to learn a latent space from event-level audio patches, then deploys multiple agents that navigate the latent space simultaneously with counterpoint forces (attraction, repulsion, inertia, jitter) to produce polyphonic recombination of the input material.
What this does
This script implements The Latent Counterpoint β an AI-powered multi-agent system that learns a latent space from event-level audio patches using an on-the-fly autoencoder, then deploys multiple agents that navigate this space simultaneously with counterpoint forces (attraction, repulsion, inertia, jitter) to produce polyphonic recombination of the input material.
π΅ What is Latent Counterpoint?
Traditional counterpoint is the art of combining independent melodic lines. This system creates a latent counterpoint:
- Events are segmented from the source audio (200msβ3s)
- Autoencoder learns a latent space where each event becomes a point
- Agents (Cantus, Florid, Shadow) navigate this space with distinct behaviors
- Forces (attraction to events, repulsion from other agents) create emergent polyphony
- The result is a new composition where multiple "voices" independently recombine the source material
Key Features:
- 6 Preset Strategies β Duo to Free Scatter, plus Custom
- 2-6 Agents β Each with distinct behavioral profile (Cantus, Florid, Shadow)
- On-the-Fly Autoencoder β Pure numpy MLP with Adam, trained on log-mel patches
- Physics Engine β Attraction, repulsion, inertia, jitter, LRU memory
- 3 Agent Profiles β Cantus (stable, central), Florid (peripheral, fast), Shadow (lagging mirror)
- Counterpoint Rigidity β Controls how strongly agents repel each other
- Speed Control β Global agent movement speed
- Comprehensive Visualization β 5-panel display with waveforms, spectrograms, agent profiles, unison stats
Technical Implementation: (1) Event Segmentation: Praat segments audio into 200msβ3s events. (2) Mel Patches: Python extracts 40Γ32 log-mel patches per event. (3) Autoencoder: Train MLP with hidden layer, leaky ReLU, denoising, L2 reg, Adam. (4) Latent Geometry: Compute center, periphery, distance matrix. (5) Agent Physics: Multi-agent simulation with forces, inertia, jitter. (6) Polyphonic Reconstruction: Sum agent outputs with panning and volume scaling. (7) Visualization & Stats.
Quick start
- In Praat, select exactly one Sound object (any duration, any content).
- Run scriptβ¦ β select
LatentCounterpoint.praat. - Choose Preset (2-7 for specific strategies, 1 for custom).
- Set number of agents, latent size, counterpoint rigidity, speed.
- Set target duration (0 = original).
- Enable Draw_visualization for analysis display.
- Click OK β engine segments, trains autoencoder, runs agent physics, reconstructs.
Latent Counterpoint Theory
Agent Profiles
π Three Behavioral Archetypes
| Profile | Role | Mass | Max Speed | Jitter | Attraction | Behavior |
|---|---|---|---|---|---|---|
| Cantus | Stable leader | 3.0 | 0.3 | 0.05 | 1.0 | Gravitates to center of gravity, slow, stable |
| Florid | Peripheral wanderer | 0.5 | 1.5 | 0.2 | 0.6 | Attracted to rare/peripheral sounds, fast, exploratory |
| Shadow | Lagging mirror | 2.0 | 0.4 | 0.08 | 0.3 | Mirrors Cantus with 3-step lag + inversion, moderate |
Physics Engine
βοΈ Agent Dynamics
At each time step, each agent experiences:
LRU memory: Each agent remembers last 5 chosen events and applies distance penalty to avoid repetition.
Latent Geometry
Polyphonic Reconstruction
Unison Rate
Preset Strategies
Preset 2: Duo (2 voices)
πΌ Two-Voice Counterpoint
Agents: 2 | Latent: 6 | Rigidity: 0.4 | Speed: 0.4
Character: Gentle duo β one Cantus, one Florid, moderate independence
Use on: Simple material, exploring two-voice counterpoint
Preset 3: Trio (3 voices)
π΅ Three-Voice Counterpoint
Agents: 3 | Latent: 8 | Rigidity: 0.5 | Speed: 0.5
Character: Balanced trio β Cantus, Florid, Shadow β full archetype set
Use on: General purpose, exploring three-voice interactions
Preset 4: Quartet (4 voices)
π» Four-Voice Ensemble
Agents: 4 | Latent: 10 | Rigidity: 0.6 | Speed: 0.5
Character: Denser texture β profiles cycle: Cantus, Florid, Shadow, Florid
Use on: Richer material, complex counterpoint
Preset 5: Dense Ensemble (5 voices)
π«οΈ Five-Voice Texture
Agents: 5 | Latent: 12 | Rigidity: 0.7 | Speed: 0.6
Character: Dense polyphonic texture with strong repulsion
Use on: Complex material, dense textures
Preset 6: Tight Counterpoint
π Highly Independent
Agents: 3 | Latent: 8 | Rigidity: 1.5 | Speed: 0.3
Character: Very strong repulsion β voices stay far apart in latent space, minimal unison
Use on: Maximizing voice independence
Preset 7: Free Scatter
π Loose, Scattered
Agents: 3 | Latent: 10 | Rigidity: 0.1 | Speed: 1.2
Character: Very weak repulsion, high speed β agents wander freely, overlapping often
Use on: Loose, overlapping textures
Parameters & Controls
Agent Parameters
| Parameter | Default | Description |
|---|---|---|
| Number_of_agents | 3 | 2-6 agents (profiles cycle through Cantus/Florid/Shadow) |
| Latent_size | 8 | Autoencoder latent dimensions (2β32) |
| Counterpoint_rigidity | 0.5 | Repulsion strength between agents (0β2) |
| Speed | 0.5 | Global agent movement speed (0.1β3) |
Duration
| Parameter | Default | Description |
|---|---|---|
| Duration (0 = original) | 0 | Target output duration (seconds) |
Output
| Parameter | Default | Description |
|---|---|---|
| Seed | 42 | Random seed for reproducibility |
| Draw_visualization | 1 | Generate 5-panel analysis display |
| Play_result | 1 | Audition after processing |
Visualization & Analysis
5-Panel Display
Reading Agent Profiles
- Steps: Number of events selected by this agent (should be similar across agents)
- Unique: How many distinct source events the agent used β lower = more repetition
- Rep rate: (steps - unique)/steps β higher = more repetition
- Travel: Average latent distance between consecutive chosen events β higher = more exploratory
- Periphery: Average distance from latent center of chosen events β higher = more peripheral
Interpreting Unison Rates
- Unison rate is the percentage of time two agents select the same event
- High unison (>30%): Voices are highly correlated β may sound like a single line
- Medium unison (10-30%): Some independence, occasional agreements
- Low unison (<10%): Highly independent voices β true counterpoint
- Adjust counterpoint_rigidity to control unison rates
Applications
Electroacoustic Composition
Use case: Creating polyphonic textures from single-source material
Technique: Trio or Quartet presets on varied source material
Workflow:
- Select a 20-60 second recording with diverse timbres
- Run with Quartet preset (4 agents)
- Examine agent profiles and unison rates
- Export and use as polyphonic movement in larger work
Generative Music
Use case: Creating endless variations with different seeds
Technique: Same preset, different seed values
Examples:
- Background textures: Free Scatter with low rigidity β loose, overlapping
- Canon-like structures: Tight Counterpoint with high rigidity β independent voices
- Evolving polyphony: Trio with moderate settings
Sound Design for Media
Use case: Creating layered, evolving textures
Technique: Dense Ensemble on appropriate sources
Applications:
- Horror: High rigidity creates dissonant independence between voices
- Ambient: Low rigidity, slow speed creates gentle overlapping
- Tension: Medium settings with moderate unison rates
Research & Education
Use case: Studying emergent polyphony, agent-based models
Technique: Compare presets on same source, examine agent trajectories
Learning outcomes:
- Understand how repulsion creates independence between voices
- See how different agent profiles produce distinct behaviors
- Observe relationship between latent space and event selection
- Explore autoencoder learning and latent space geometry
Practical Workflow Examples
π¬ Film Scene: Multiple Personalities
Goal: Create 60-second cue representing multiple characters from a single voice
Settings:
- Source: 30-second vocal improvisation
- Preset: Quartet
- Custom: rigidity=0.8 (strong independence), speed=0.4
Result: Four independent vocal streams, each with distinct character (Cantus, Florid, Shadow, Florid)
ποΈ Electronic Music: Polyphonic Pad
Goal: Create evolving pad with multiple layers
Settings:
- Source: 8-second synth pad
- Preset: Trio
- Custom: speed=0.3 (slow), rigidity=0.3 (loose)
Result: 24-second evolving pad with three overlapping voices
ποΈ Voice Processing: Choral Effect
Goal: Create choral texture from solo voice
Settings:
- Source: 10-second vocal phrase
- Preset: Dense Ensemble (5 voices)
- Custom: rigidity=0.5, speed=0.5
Result: Five-voice polyphony from single voice β creates choral effect
Troubleshooting Common Issues
Cause: Python not installed, or packages missing
Solution: Install Python and required packages: pip install numpy soundfile scipy
Cause: Source has few intensity peaks, or segmentation parameters inappropriate
Solution: Use source with more dynamic variation, or adjust min/max event duration in script
Cause: Too few steps, too small latent size, or data too complex
Solution: Increase learning_steps, increase latent_size, or use simpler source
Cause: counterpoint_rigidity too low, or speed too high causing convergence
Solution: Increase rigidity, reduce speed, check agent profiles
Cause: Crossfade insufficient at splice points
Solution: Increase XFADE_SEC in Python script (currently 8ms)
Advanced Techniques
In Python script, modify mass, max_speed, jitter_scale, and attraction_weight in Agent.__init__() to create new behavioral types.
Change a.memory_size (default 5) to control how strongly agents avoid recent events.
Modify penalty scaling (currently median_dist * 0.3) to control repetition avoidance strength.
Script outputs stereo. For multi-channel, modify reconstruct_polyphonic() to output N channels with custom panning.