Latent Spat — Agent-Based Spatialization — User Guide
Extends Latent Counterpoint with physical space: each agent's latent position maps to spatial coordinates via VBAP panning. Latent X → azimuth (position around listener), Latent Y → distance (amplitude, filtering, reverb). When agents repel in timbre, they move to opposite sides of the room. The counterpoint becomes spatial.
What this does
This script implements Latent Spat — an agent-based spatialization engine that extends the Latent Counterpoint concept into physical space. Multiple agents navigate a latent space learned from audio events; their latent positions are mapped to spatial coordinates (azimuth and distance) and rendered into multi-channel audio using VBAP (Vector Base Amplitude Panning).
🎧 What is Agent-Based Spatialization?
This approach combines three powerful concepts:
- Latent Counterpoint: Agents (Cantus, Florid, Shadow) navigate a learned latent space, selecting events based on their position and interactions
- Spatial Mapping: Each agent's latent trajectory is projected to 2D and mapped to azimuth (angle around listener) and distance
- VBAP Rendering: For each step, the agent's azimuth determines gains to surrounding speakers; distance controls amplitude, low-pass filtering (proximity effect), and reverb
The result is a spatial composition where timbral relationships become spatial relationships — agents that repel in timbre move to opposite sides of the room, creating a true fusion of counterpoint and space.
Key Features:
- 6 Preset Strategies — Stereo Duo to Immersive Drift, plus Custom
- 2-6 Agents — Each with distinct behavioral profile (Cantus, Florid, Shadow)
- 4 Spatial Formats — Stereo, Quad, 5.1, Octophonic
- 3 Distance Models — Amplitude only, +Low-pass (proximity effect), +Reverb
- VBAP Panning — Vector Base Amplitude Panning for accurate spatial placement
- Agent Profiles — Cantus (stable center), Florid (peripheral wanderer), Shadow (lagging mirror)
- Counterpoint Rigidity — Controls how strongly agents repel each other
- Comprehensive Visualization — Spatial field diagram, agent profiles, separation stats
Technical Implementation: (1) Event Segmentation: Praat segments audio into events. (2) Mel Patches: 40×32 log-mel patches per event. (3) Autoencoder: Train on-the-fly, encode to latent space. (4) Agent Physics: Multi-agent simulation with profiles, repulsion, attraction. (5) Spatial Mapping: PCA of agent trajectories → 2D → azimuth & distance. (6) VBAP Rendering: Per-step gains for each speaker, distance processing, multi-channel output.
Quick start
- In Praat, select exactly one Sound object (any duration, any content).
- Run script… → select
LatentSpat.praat. - Choose Preset (2-7 for specific strategies, 1 for custom).
- Set number of agents, latent size, counterpoint rigidity, speed.
- Choose spatial format and distance model.
- Adjust reverb amount and target duration (0 = original).
- Enable Draw_visualization for analysis display.
- Click OK — engine segments, trains autoencoder, runs agent physics, renders spatial audio.
Agent-Based Spatialization Theory
Agent Profiles
🎭 Three Behavioral Archetypes
| Profile | Role | Mass | Speed | Attraction | Spatial Tendency |
|---|---|---|---|---|---|
| Cantus | Stable leader | 3.0 | 0.3 | 1.0 | Center, stable azimuth |
| Florid | Peripheral wanderer | 0.5 | 1.5 | 0.6 | Wide azimuth range, periphery |
| Shadow | Lagging mirror | 2.0 | 0.4 | 0.3 | Opposite Cantus, delayed |
Repulsion: Agents repel each other with force ∝ rigidity × median_dist² / distance². This creates spatial separation when they disagree timbrally.
Spatial Mapping
VBAP (Vector Base Amplitude Panning)
📐 2D Circular Panning
For an azimuth angle θ and N speakers at angles φ₁...φ_N:
- Find the two speakers that bracket θ (considering wrap-around)
- Compute angular fractions: α = (θ - φ_left) / (φ_right - φ_left)
- Equal-power gains: g_left = cos(α × π/2), g_right = sin(α × π/2)
- Other speakers get zero gain
Speaker layouts:
- Stereo: 330° (L), 30° (R)
- Quad: 315° (FL), 45° (FR), 225° (RL), 135° (RR)
- 5.1: 330°(L), 30°(R), 0°(C), 0°(LFE), 240°(LS), 120°(RS)
- Octophonic: 0°,45°,90°,135°,180°,225°,270°,315°
Distance Processing
Latent Counterpoint Physics
Preset Strategies
Preset 2: Stereo Duo
🎼 2-Agent Stereo
Agents: 2 | Latent: 6 | Rigidity: 0.4 | Speed: 0.4
Format: Stereo | Distance: Amp+LP | Reverb: 0.2
Character: Gentle duo in stereo field — one Cantus, one Florid
Use on: Simple material, stereo listening
Preset 3: Quad Trio
🔊 3-Agent Quadraphonic
Agents: 3 | Latent: 8 | Rigidity: 0.5 | Speed: 0.5
Format: Quad | Distance: Amp+LP+Reverb | Reverb: 0.4
Character: Full trio (Cantus, Florid, Shadow) in four corners
Use on: Quadraphonic installations, immersive experiments
Preset 4: Surround Ensemble
🌍 4-Agent 5.1
Agents: 4 | Latent: 10 | Rigidity: 0.6 | Speed: 0.5
Format: 5.1 | Distance: Amp+LP+Reverb | Reverb: 0.5
Character: Four agents (Cantus, Florid, Shadow, Florid) in full surround
Use on: 5.1 systems, cinematic material
Preset 5: Octophonic Scatter
🌀 5-Agent 8-Channel
Agents: 5 | Latent: 12 | Rigidity: 0.8 | Speed: 0.8
Format: Octophonic | Distance: Amp+LP+Reverb | Reverb: 0.5
Character: Five agents scattering across 8 speakers — dense spatial texture
Use on: Octophonic installations, immersive art
Preset 6: Immersive Drift
🌫️ 3-Agent Drifting
Agents: 3 | Latent: 10 | Rigidity: 0.3 | Speed: 0.3
Format: Octophonic | Distance: Amp+LP+Reverb | Reverb: 0.7
Character: Slow drift through octophonic space with heavy reverb
Use on: Ambient, meditative spatial textures
Preset 7: Close-Mic Trio
🎙️ Dry, Close Trio
Agents: 3 | Latent: 8 | Rigidity: 0.6 | Speed: 0.5
Format: Quad | Distance: Amplitude only | Reverb: 0.0
Character: Close, dry spatialization — agents move but no filtering/reverb
Use on: Dry recordings, clarity-focused material
Parameters & Controls
Agent Parameters
| Parameter | Default | Description |
|---|---|---|
| Number_of_agents | 3 | 2-6 agents (profiles cycle through Cantus/Florid/Shadow) |
| Latent_size | 8 | Autoencoder latent dimensions (2–32) |
| Counterpoint_rigidity | 0.5 | Repulsion strength between agents (0–2) |
| Speed | 0.5 | Overall agent movement speed (0.1–3) |
Spatial Parameters
| Parameter | Default | Description |
|---|---|---|
| Spatial_format | Stereo | Stereo, Quad, 5.1, Octophonic |
| Distance_model | Amp+LP+Reverb | Amplitude only, +Low-pass, +Reverb |
| Reverb_amount | 0.4 | Strength of distance-dependent reverb (0–1) |
| Duration (0 = original) | 0 | Target output duration (0 = original length) |
| Seed | 42 | Random seed for reproducibility |
Output
| Parameter | Default | Description |
|---|---|---|
| Draw_visualization | 1 | Generate 5-panel analysis display |
| Play_result | 1 | Audition after processing |
Visualization & Analysis
5-Panel Display
Reading the Spatial Field Diagram
- Gray squares: Speaker positions — labeled according to format (L, R, FL, FR, etc.)
- Colored circles: Agents — position indicates their current location in space (averaged over the whole trajectory)
- Letters inside circles: C = Cantus, F = Florid, S = Shadow — identifies agent profile
- Gray circles: Reference distances at radius 0.5 and 1.0 — agents near center are closer (louder, brighter), near edge are farther (quieter, more filtered)
- Color consistency: Each agent has a consistent color across all visualizations
Interpreting Spatial Separation
- Spatial separation (degrees): Average angular distance between two agents over time — higher means they stay apart spatially
- Unison rate: How often agents choose the same event — low unison means they're exploring different material
- Azimuth range: Total angular span covered by an agent — wide range = wandering, narrow = stable
- Azimuth travel: Total angular distance traveled — high = active, low = stationary
Applications
Electroacoustic Composition
Use case: Creating spatial compositions where timbral relationships become spatial
Technique: Quad Trio or Octophonic Scatter presets on varied source material
Workflow:
- Select a 20-60 second recording with diverse timbres
- Run with Quad Trio preset for 4-channel output
- Examine agent profiles and spatial separation stats
- Export and use as movement in multi-channel installation
Sound Design for Immersive Media
Use case: Creating spatial textures for VR, 360 video, or multi-channel installations
Technique: Immersive Drift or Surround Ensemble on appropriate sources
Applications:
- VR environments: Agents drift through space, creating evolving sound field
- Cinema: 5.1 output with agents moving around the theater
- Installation art: Octophonic scatter creates dense spatial texture
Music Production
Use case: Creating spatialized arrangements from stem material
Technique: Close-Mic Trio for clarity, or Quad Trio for immersive
Examples:
- Stems: Each agent could represent an instrument, moving through space
- Ambient music: Immersive Drift creates slowly evolving spatial pads
- Electronic music: Stereo Duo with fast speed creates dynamic stereo movement
Research & Education
Use case: Studying spatial perception, agent-based models, VBAP
Technique: Compare presets on same source, examine agent trajectories
Learning outcomes:
- Understand how latent space maps to spatial parameters
- See how agent repulsion creates spatial separation
- Explore distance processing models (amplitude, filtering, reverb)
- Learn VBAP and multi-channel rendering
Practical Workflow Examples
🎬 Film Scene: Spatial Tension
Goal: Create 60-second spatial tension cue for 5.1 surround
Settings:
- Source: 30-second drone with subtle variations
- Preset: Surround Ensemble
- Custom: rigidity=0.8 (strong repulsion), reverb=0.6
Result: Four agents repelling in timbre → spatial separation around the room, creating tension
🎚️ Electronic Music: Evolving Pads
Goal: Create evolving pad texture in quadraphonic
Settings:
- Source: 8-second synth pad
- Preset: Quad Trio
- Custom: speed=0.3 (slow), distance=Amp+LP (filtered)
Result: 24-second evolving pad with agents slowly drifting through quad space
🎙️ Voice Spatialization
Goal: Spatialize solo voice across octophonic array
Settings:
- Source: 10-second vocal phrase
- Preset: Octophonic Scatter
- Custom: rigidity=0.4 (gentle repulsion), reverb=0.5
Result: Voice fragmented across 8 speakers, creating immersive choral effect
Troubleshooting Common Issues
Cause: Python not installed, or packages missing
Solution: Install Python and required packages: pip install numpy soundfile scipy
Cause: Crossfade insufficient at splice points
Solution: Increase XFADE_SEC in Python script (currently 8ms)
Cause: Low speed, or latent space too constrained
Solution: Increase speed, reduce rigidity, check agent profiles
Cause: Agents attracted to same events, or repulsion too weak
Solution: Increase counterpoint_rigidity, check agent profiles
Cause: Playback system not configured for multi-channel
Solution: Use appropriate software/hardware, or downmix to stereo
Advanced Techniques
In Python script, modify SPEAKER_LAYOUTS and SPEAKER_LABELS to implement custom speaker configurations (e.g., 7.1, ambisonic, irregular arrays).
Modify mass, speed, attraction weights in Agent.__init__() to create new behavioral archetypes.
Adjust distance_attenuation, distance_lowpass, and distance_reverb functions to implement different spatial models.
For true multi-channel monitoring, use a DAW with multi-channel output routing, or specialized software like Reaper, Logic Pro, or Max/MSP.