Latent Spat — Agent-Based Spatialization — User Guide

Extends Latent Counterpoint with physical space: each agent's latent position maps to spatial coordinates via VBAP panning. Latent X → azimuth (position around listener), Latent Y → distance (amplitude, filtering, reverb). When agents repel in timbre, they move to opposite sides of the room. The counterpoint becomes spatial.

Author: Shai Cohen Affiliation: Department of Music, Bar-Ilan University, Israel Version: 1.0 (2025) License: MIT License Citation: Cohen, S. (2025). Praat AudioTools Repo: https://github.com/ShaiCohen-ops/Praat-plugin_AudioTools

Contents:

What this does Quick start Agent-Based Spatialization Theory Preset Strategies Parameters & Controls Visualization & Analysis Applications

What this does

This script implements Latent Spat — an agent-based spatialization engine that extends the Latent Counterpoint concept into physical space. Multiple agents navigate a latent space learned from audio events; their latent positions are mapped to spatial coordinates (azimuth and distance) and rendered into multi-channel audio using VBAP (Vector Base Amplitude Panning).

🎧 What is Agent-Based Spatialization?

This approach combines three powerful concepts:

Latent Counterpoint: Agents (Cantus, Florid, Shadow) navigate a learned latent space, selecting events based on their position and interactions
Spatial Mapping: Each agent's latent trajectory is projected to 2D and mapped to azimuth (angle around listener) and distance
VBAP Rendering: For each step, the agent's azimuth determines gains to surrounding speakers; distance controls amplitude, low-pass filtering (proximity effect), and reverb

The result is a spatial composition where timbral relationships become spatial relationships — agents that repel in timbre move to opposite sides of the room, creating a true fusion of counterpoint and space.

Key Features:

6 Preset Strategies — Stereo Duo to Immersive Drift, plus Custom
2-6 Agents — Each with distinct behavioral profile (Cantus, Florid, Shadow)
4 Spatial Formats — Stereo, Quad, 5.1, Octophonic
3 Distance Models — Amplitude only, +Low-pass (proximity effect), +Reverb
VBAP Panning — Vector Base Amplitude Panning for accurate spatial placement
Agent Profiles — Cantus (stable center), Florid (peripheral wanderer), Shadow (lagging mirror)
Counterpoint Rigidity — Controls how strongly agents repel each other
Comprehensive Visualization — Spatial field diagram, agent profiles, separation stats

Technical Implementation: (1) Event Segmentation: Praat segments audio into events. (2) Mel Patches: 40×32 log-mel patches per event. (3) Autoencoder: Train on-the-fly, encode to latent space. (4) Agent Physics: Multi-agent simulation with profiles, repulsion, attraction. (5) Spatial Mapping: PCA of agent trajectories → 2D → azimuth & distance. (6) VBAP Rendering: Per-step gains for each speaker, distance processing, multi-channel output.

Quick start

In Praat, select exactly one Sound object (any duration, any content).
Run script… → select LatentSpat.praat.
Choose Preset (2-7 for specific strategies, 1 for custom).
Set number of agents, latent size, counterpoint rigidity, speed.
Choose spatial format and distance model.
Adjust reverb amount and target duration (0 = original).
Enable Draw_visualization for analysis display.
Click OK — engine segments, trains autoencoder, runs agent physics, renders spatial audio.

Quick tip: Start with Quad Trio preset on a 10-20 second recording with varied texture. Enable visualization — you'll see the spatial field diagram with agent positions (colored circles), agent profiles, and spatial separation stats. Listen in quadraphonic setup (or simulated via headphones) to hear agents moving around the space. The output appears as "source_spat" in the Objects window.

Important: PYTHON DEPENDENCIES — Requires numpy, soundfile, scipy (no scikit-learn). AUTOENCODER TRAINING happens on-the-fly and may take 30-60 seconds. SPATIAL FORMATS require appropriate monitoring — for 5.1/octophonic, you'll need a multi-channel interface or software. VBAP assumes speakers are evenly spaced — for custom layouts, modify the speaker_azimuths arrays in Python.

Agent-Based Spatialization Theory

Agent Profiles

🎭 Three Behavioral Archetypes

Profile	Role	Mass	Speed	Attraction	Spatial Tendency
Cantus	Stable leader	3.0	0.3	1.0	Center, stable azimuth
Florid	Peripheral wanderer	0.5	1.5	0.6	Wide azimuth range, periphery
Shadow	Lagging mirror	2.0	0.4	0.3	Opposite Cantus, delayed

Repulsion: Agents repel each other with force ∝ rigidity × median_dist² / distance². This creates spatial separation when they disagree timbrally.

Spatial Mapping

For each agent's latent position p(t) at step t: 1. Project to 2D via PCA on all event latents: (x, y) = (p(t) - μ) · V₂ 2. Normalize across all agent positions: x_norm = (x - x_min) / (x_max - x_min) y_norm = (y - y_min) / (y_max - y_min) 3. Map to spatial parameters: azimuth = x_norm × 360° (0° = front) distance = 0.1 + 0.9 × y_norm (0.1 = close, 1.0 = far)

VBAP (Vector Base Amplitude Panning)

📐 2D Circular Panning

For an azimuth angle θ and N speakers at angles φ₁...φ_N:

Find the two speakers that bracket θ (considering wrap-around)
Compute angular fractions: α = (θ - φ_left) / (φ_right - φ_left)
Equal-power gains: g_left = cos(α × π/2), g_right = sin(α × π/2)
Other speakers get zero gain

Speaker layouts:

Stereo: 330° (L), 30° (R)
Quad: 315° (FL), 45° (FR), 225° (RL), 135° (RR)
5.1: 330°(L), 30°(R), 0°(C), 0°(LFE), 240°(LS), 120°(RS)
Octophonic: 0°,45°,90°,135°,180°,225°,270°,315°

Distance Processing

Three distance models (increasing complexity): Amplitude only: gain = 1 / (1 + 3·(d - 0.1)) [clamped to 0.15–1.0] Amp+LowPass: apply first-order IIR low-pass with cutoff = 20000 × (1 - 0.9·d) Hz (500–20000 Hz) Amp+LP+Reverb: add distance-dependent reverb tail delay = 30 + 40·d ms, feedback = 0.4·d·reverb_amount

Latent Counterpoint Physics

At each step, each agent experiences forces: 1. Attraction to nearest event in latent space (profile-weighted) 2. Repulsion from other agents: F_rep = rigidity × median_dist² / distance² 3. Profile-specific forces (Cantus: center pull, Florid: periphery, Shadow: lagging mirror of Cantus) 4. Jitter (profile-dependent scale) Velocity updated with inertia (0.85 damping), clamped to max_speed.

Preset Strategies

Preset 2: Stereo Duo

🎼 2-Agent Stereo

Agents: 2 | Latent: 6 | Rigidity: 0.4 | Speed: 0.4

Format: Stereo | Distance: Amp+LP | Reverb: 0.2

Character: Gentle duo in stereo field — one Cantus, one Florid

Use on: Simple material, stereo listening

Preset 3: Quad Trio

🔊 3-Agent Quadraphonic

Agents: 3 | Latent: 8 | Rigidity: 0.5 | Speed: 0.5

Format: Quad | Distance: Amp+LP+Reverb | Reverb: 0.4

Character: Full trio (Cantus, Florid, Shadow) in four corners

Use on: Quadraphonic installations, immersive experiments

Preset 4: Surround Ensemble

🌍 4-Agent 5.1

Agents: 4 | Latent: 10 | Rigidity: 0.6 | Speed: 0.5

Format: 5.1 | Distance: Amp+LP+Reverb | Reverb: 0.5

Character: Four agents (Cantus, Florid, Shadow, Florid) in full surround

Use on: 5.1 systems, cinematic material

Preset 5: Octophonic Scatter

🌀 5-Agent 8-Channel

Agents: 5 | Latent: 12 | Rigidity: 0.8 | Speed: 0.8

Format: Octophonic | Distance: Amp+LP+Reverb | Reverb: 0.5

Character: Five agents scattering across 8 speakers — dense spatial texture

Use on: Octophonic installations, immersive art

Preset 6: Immersive Drift

🌫️ 3-Agent Drifting

Agents: 3 | Latent: 10 | Rigidity: 0.3 | Speed: 0.3

Format: Octophonic | Distance: Amp+LP+Reverb | Reverb: 0.7

Character: Slow drift through octophonic space with heavy reverb

Use on: Ambient, meditative spatial textures

Preset 7: Close-Mic Trio

🎙️ Dry, Close Trio

Agents: 3 | Latent: 8 | Rigidity: 0.6 | Speed: 0.5

Format: Quad | Distance: Amplitude only | Reverb: 0.0

Character: Close, dry spatialization — agents move but no filtering/reverb

Use on: Dry recordings, clarity-focused material

Parameters & Controls

Agent Parameters

Parameter	Default	Description
Number_of_agents	3	2-6 agents (profiles cycle through Cantus/Florid/Shadow)
Latent_size	8	Autoencoder latent dimensions (2–32)
Counterpoint_rigidity	0.5	Repulsion strength between agents (0–2)
Speed	0.5	Overall agent movement speed (0.1–3)

Spatial Parameters

Parameter	Default	Description
Spatial_format	Stereo	Stereo, Quad, 5.1, Octophonic
Distance_model	Amp+LP+Reverb	Amplitude only, +Low-pass, +Reverb
Reverb_amount	0.4	Strength of distance-dependent reverb (0–1)
Duration (0 = original)	0	Target output duration (0 = original length)
Seed	42	Random seed for reproducibility

Output

Parameter	Default	Description
Draw_visualization	1	Generate 5-panel analysis display
Play_result	1	Audition after processing

Visualization & Analysis

5-Panel Display

Latent Spat Visualization: Panel 1: TITLE • Script name, source name, preset, spatial format, channels, distance model Panel 2: INPUT WAVEFORM • Gray waveform with red event boundaries • Title: "Original (N events)" Panel 3: OUTPUT WAVEFORM (Channel 1) • Blue waveform of first output channel • Title: "Ch 1" • X-axis: Time (s) Panel 4: SPATIAL FIELD DIAGRAM • Top-down view of listener space (-1.5 to +1.5) • Gray circles at radius 0.5 and 1.0 • Black "+" at listener position • Gray squares = speaker positions (labeled) • Colored circles = agent positions with: - Agent 0 (blue), Agent 1 (red), Agent 2 (green), Agent 3 (purple), etc. - Center letters: C=Cantus, F=Florid, S=Shadow • Title: "Spatial field (top-down) C=Cantus F=Florid S=Shadow" Panel 5: AGENT SPATIAL PROFILES • For each agent: profile, azimuth range, travel distance, mean distance, steps, unique events • Color-coded by agent color Panel 6: COUNTERPOINT & SEPARATION • Spatial separation between agent pairs (in degrees) • Unison rates (how often agents select same event) Panel 7: SUMMARY PANEL • Format, channels, distance model, reverb • Events, unique used, mean event duration • Autoencoder loss, latent size, seed • Duration in/out

Reading the Spatial Field Diagram

What the diagram shows:

Gray squares: Speaker positions — labeled according to format (L, R, FL, FR, etc.)
Colored circles: Agents — position indicates their current location in space (averaged over the whole trajectory)
Letters inside circles: C = Cantus, F = Florid, S = Shadow — identifies agent profile
Gray circles: Reference distances at radius 0.5 and 1.0 — agents near center are closer (louder, brighter), near edge are farther (quieter, more filtered)
Color consistency: Each agent has a consistent color across all visualizations

Interpreting Spatial Separation

What the numbers mean:

Spatial separation (degrees): Average angular distance between two agents over time — higher means they stay apart spatially
Unison rate: How often agents choose the same event — low unison means they're exploring different material
Azimuth range: Total angular span covered by an agent — wide range = wandering, narrow = stable
Azimuth travel: Total angular distance traveled — high = active, low = stationary

Applications

Electroacoustic Composition

Use case: Creating spatial compositions where timbral relationships become spatial

Technique: Quad Trio or Octophonic Scatter presets on varied source material

Workflow:

Select a 20-60 second recording with diverse timbres
Run with Quad Trio preset for 4-channel output
Examine agent profiles and spatial separation stats
Export and use as movement in multi-channel installation

Sound Design for Immersive Media

Use case: Creating spatial textures for VR, 360 video, or multi-channel installations

Technique: Immersive Drift or Surround Ensemble on appropriate sources

Applications:

VR environments: Agents drift through space, creating evolving sound field
Cinema: 5.1 output with agents moving around the theater
Installation art: Octophonic scatter creates dense spatial texture

Music Production

Use case: Creating spatialized arrangements from stem material

Technique: Close-Mic Trio for clarity, or Quad Trio for immersive

Examples:

Stems: Each agent could represent an instrument, moving through space
Ambient music: Immersive Drift creates slowly evolving spatial pads
Electronic music: Stereo Duo with fast speed creates dynamic stereo movement

Research & Education

Use case: Studying spatial perception, agent-based models, VBAP

Technique: Compare presets on same source, examine agent trajectories

Learning outcomes:

Understand how latent space maps to spatial parameters
See how agent repulsion creates spatial separation
Explore distance processing models (amplitude, filtering, reverb)
Learn VBAP and multi-channel rendering

Practical Workflow Examples

🎬 Film Scene: Spatial Tension

Goal: Create 60-second spatial tension cue for 5.1 surround

Settings:

Source: 30-second drone with subtle variations
Preset: Surround Ensemble
Custom: rigidity=0.8 (strong repulsion), reverb=0.6

Result: Four agents repelling in timbre → spatial separation around the room, creating tension

🎚️ Electronic Music: Evolving Pads

Goal: Create evolving pad texture in quadraphonic

Settings:

Source: 8-second synth pad
Preset: Quad Trio
Custom: speed=0.3 (slow), distance=Amp+LP (filtered)

Result: 24-second evolving pad with agents slowly drifting through quad space

🎙️ Voice Spatialization

Goal: Spatialize solo voice across octophonic array

Settings:

Source: 10-second vocal phrase
Preset: Octophonic Scatter
Custom: rigidity=0.4 (gentle repulsion), reverb=0.5

Result: Voice fragmented across 8 speakers, creating immersive choral effect

Troubleshooting Common Issues

Problem: Python not found or missing packages
Cause: Python not installed, or packages missing
Solution: Install Python and required packages: pip install numpy soundfile scipy

Problem: Output has clicks
Cause: Crossfade insufficient at splice points
Solution: Increase XFADE_SEC in Python script (currently 8ms)

Problem: Agents not moving spatially
Cause: Low speed, or latent space too constrained
Solution: Increase speed, reduce rigidity, check agent profiles

Problem: Spatial separation stats low
Cause: Agents attracted to same events, or repulsion too weak
Solution: Increase counterpoint_rigidity, check agent profiles

Problem: Multi-channel output not playing correctly
Cause: Playback system not configured for multi-channel
Solution: Use appropriate software/hardware, or downmix to stereo

Advanced Techniques

Custom speaker layouts:

In Python script, modify SPEAKER_LAYOUTS and SPEAKER_LABELS to implement custom speaker configurations (e.g., 7.1, ambisonic, irregular arrays).

Agent profile tuning:

Modify mass, speed, attraction weights in Agent.__init__() to create new behavioral archetypes.

Distance processing customization:

Adjust distance_attenuation, distance_lowpass, and distance_reverb functions to implement different spatial models.

Multi-channel monitoring:

For true multi-channel monitoring, use a DAW with multi-channel output routing, or specialized software like Reaper, Logic Pro, or Max/MSP.