Latent Spat — Agent-Based Spatialization — User Guide

Extends Latent Counterpoint with physical space: each agent's latent position maps to spatial coordinates via VBAP panning. Latent X → azimuth (position around listener), Latent Y → distance (amplitude, filtering, reverb). When agents repel in timbre, they move to opposite sides of the room. The counterpoint becomes spatial.

Author: Shai Cohen Affiliation: Department of Music, Bar-Ilan University, Israel Version: 1.0 (2025) License: MIT License Citation: Cohen, S. (2025). Praat AudioTools Repo: https://github.com/ShaiCohen-ops/Praat-plugin_AudioTools
Contents:

What this does

This script implements Latent Spat — an agent-based spatialization engine that extends the Latent Counterpoint concept into physical space. Multiple agents navigate a latent space learned from audio events; their latent positions are mapped to spatial coordinates (azimuth and distance) and rendered into multi-channel audio using VBAP (Vector Base Amplitude Panning).

🎧 What is Agent-Based Spatialization?

This approach combines three powerful concepts:

  • Latent Counterpoint: Agents (Cantus, Florid, Shadow) navigate a learned latent space, selecting events based on their position and interactions
  • Spatial Mapping: Each agent's latent trajectory is projected to 2D and mapped to azimuth (angle around listener) and distance
  • VBAP Rendering: For each step, the agent's azimuth determines gains to surrounding speakers; distance controls amplitude, low-pass filtering (proximity effect), and reverb

The result is a spatial composition where timbral relationships become spatial relationships — agents that repel in timbre move to opposite sides of the room, creating a true fusion of counterpoint and space.

Key Features:

Technical Implementation: (1) Event Segmentation: Praat segments audio into events. (2) Mel Patches: 40×32 log-mel patches per event. (3) Autoencoder: Train on-the-fly, encode to latent space. (4) Agent Physics: Multi-agent simulation with profiles, repulsion, attraction. (5) Spatial Mapping: PCA of agent trajectories → 2D → azimuth & distance. (6) VBAP Rendering: Per-step gains for each speaker, distance processing, multi-channel output.

Quick start

  1. In Praat, select exactly one Sound object (any duration, any content).
  2. Run script… → select LatentSpat.praat.
  3. Choose Preset (2-7 for specific strategies, 1 for custom).
  4. Set number of agents, latent size, counterpoint rigidity, speed.
  5. Choose spatial format and distance model.
  6. Adjust reverb amount and target duration (0 = original).
  7. Enable Draw_visualization for analysis display.
  8. Click OK — engine segments, trains autoencoder, runs agent physics, renders spatial audio.
Quick tip: Start with Quad Trio preset on a 10-20 second recording with varied texture. Enable visualization — you'll see the spatial field diagram with agent positions (colored circles), agent profiles, and spatial separation stats. Listen in quadraphonic setup (or simulated via headphones) to hear agents moving around the space. The output appears as "source_spat" in the Objects window.
Important: PYTHON DEPENDENCIES — Requires numpy, soundfile, scipy (no scikit-learn). AUTOENCODER TRAINING happens on-the-fly and may take 30-60 seconds. SPATIAL FORMATS require appropriate monitoring — for 5.1/octophonic, you'll need a multi-channel interface or software. VBAP assumes speakers are evenly spaced — for custom layouts, modify the speaker_azimuths arrays in Python.

Agent-Based Spatialization Theory

Agent Profiles

🎭 Three Behavioral Archetypes

ProfileRoleMassSpeedAttractionSpatial Tendency
CantusStable leader3.00.31.0Center, stable azimuth
FloridPeripheral wanderer0.51.50.6Wide azimuth range, periphery
ShadowLagging mirror2.00.40.3Opposite Cantus, delayed

Repulsion: Agents repel each other with force ∝ rigidity × median_dist² / distance². This creates spatial separation when they disagree timbrally.

Spatial Mapping

For each agent's latent position p(t) at step t: 1. Project to 2D via PCA on all event latents: (x, y) = (p(t) - μ) · V₂ 2. Normalize across all agent positions: x_norm = (x - x_min) / (x_max - x_min) y_norm = (y - y_min) / (y_max - y_min) 3. Map to spatial parameters: azimuth = x_norm × 360° (0° = front) distance = 0.1 + 0.9 × y_norm (0.1 = close, 1.0 = far)

VBAP (Vector Base Amplitude Panning)

📐 2D Circular Panning

For an azimuth angle θ and N speakers at angles φ₁...φ_N:

  1. Find the two speakers that bracket θ (considering wrap-around)
  2. Compute angular fractions: α = (θ - φ_left) / (φ_right - φ_left)
  3. Equal-power gains: g_left = cos(α × π/2), g_right = sin(α × π/2)
  4. Other speakers get zero gain

Speaker layouts:

  • Stereo: 330° (L), 30° (R)
  • Quad: 315° (FL), 45° (FR), 225° (RL), 135° (RR)
  • 5.1: 330°(L), 30°(R), 0°(C), 0°(LFE), 240°(LS), 120°(RS)
  • Octophonic: 0°,45°,90°,135°,180°,225°,270°,315°

Distance Processing

Three distance models (increasing complexity): Amplitude only: gain = 1 / (1 + 3·(d - 0.1)) [clamped to 0.15–1.0] Amp+LowPass: apply first-order IIR low-pass with cutoff = 20000 × (1 - 0.9·d) Hz (500–20000 Hz) Amp+LP+Reverb: add distance-dependent reverb tail delay = 30 + 40·d ms, feedback = 0.4·d·reverb_amount

Latent Counterpoint Physics

At each step, each agent experiences forces: 1. Attraction to nearest event in latent space (profile-weighted) 2. Repulsion from other agents: F_rep = rigidity × median_dist² / distance² 3. Profile-specific forces (Cantus: center pull, Florid: periphery, Shadow: lagging mirror of Cantus) 4. Jitter (profile-dependent scale) Velocity updated with inertia (0.85 damping), clamped to max_speed.

Preset Strategies

Preset 2: Stereo Duo

🎼 2-Agent Stereo

Agents: 2 | Latent: 6 | Rigidity: 0.4 | Speed: 0.4

Format: Stereo | Distance: Amp+LP | Reverb: 0.2

Character: Gentle duo in stereo field — one Cantus, one Florid

Use on: Simple material, stereo listening

Preset 3: Quad Trio

🔊 3-Agent Quadraphonic

Agents: 3 | Latent: 8 | Rigidity: 0.5 | Speed: 0.5

Format: Quad | Distance: Amp+LP+Reverb | Reverb: 0.4

Character: Full trio (Cantus, Florid, Shadow) in four corners

Use on: Quadraphonic installations, immersive experiments

Preset 4: Surround Ensemble

🌍 4-Agent 5.1

Agents: 4 | Latent: 10 | Rigidity: 0.6 | Speed: 0.5

Format: 5.1 | Distance: Amp+LP+Reverb | Reverb: 0.5

Character: Four agents (Cantus, Florid, Shadow, Florid) in full surround

Use on: 5.1 systems, cinematic material

Preset 5: Octophonic Scatter

🌀 5-Agent 8-Channel

Agents: 5 | Latent: 12 | Rigidity: 0.8 | Speed: 0.8

Format: Octophonic | Distance: Amp+LP+Reverb | Reverb: 0.5

Character: Five agents scattering across 8 speakers — dense spatial texture

Use on: Octophonic installations, immersive art

Preset 6: Immersive Drift

🌫️ 3-Agent Drifting

Agents: 3 | Latent: 10 | Rigidity: 0.3 | Speed: 0.3

Format: Octophonic | Distance: Amp+LP+Reverb | Reverb: 0.7

Character: Slow drift through octophonic space with heavy reverb

Use on: Ambient, meditative spatial textures

Preset 7: Close-Mic Trio

🎙️ Dry, Close Trio

Agents: 3 | Latent: 8 | Rigidity: 0.6 | Speed: 0.5

Format: Quad | Distance: Amplitude only | Reverb: 0.0

Character: Close, dry spatialization — agents move but no filtering/reverb

Use on: Dry recordings, clarity-focused material

Parameters & Controls

Agent Parameters

ParameterDefaultDescription
Number_of_agents32-6 agents (profiles cycle through Cantus/Florid/Shadow)
Latent_size8Autoencoder latent dimensions (2–32)
Counterpoint_rigidity0.5Repulsion strength between agents (0–2)
Speed0.5Overall agent movement speed (0.1–3)

Spatial Parameters

ParameterDefaultDescription
Spatial_formatStereoStereo, Quad, 5.1, Octophonic
Distance_modelAmp+LP+ReverbAmplitude only, +Low-pass, +Reverb
Reverb_amount0.4Strength of distance-dependent reverb (0–1)
Duration (0 = original)0Target output duration (0 = original length)
Seed42Random seed for reproducibility

Output

ParameterDefaultDescription
Draw_visualization1Generate 5-panel analysis display
Play_result1Audition after processing

Visualization & Analysis

5-Panel Display

Latent Spat Visualization: Panel 1: TITLE • Script name, source name, preset, spatial format, channels, distance model Panel 2: INPUT WAVEFORM • Gray waveform with red event boundaries • Title: "Original (N events)" Panel 3: OUTPUT WAVEFORM (Channel 1) • Blue waveform of first output channel • Title: "Ch 1" • X-axis: Time (s) Panel 4: SPATIAL FIELD DIAGRAM • Top-down view of listener space (-1.5 to +1.5) • Gray circles at radius 0.5 and 1.0 • Black "+" at listener position • Gray squares = speaker positions (labeled) • Colored circles = agent positions with: - Agent 0 (blue), Agent 1 (red), Agent 2 (green), Agent 3 (purple), etc. - Center letters: C=Cantus, F=Florid, S=Shadow • Title: "Spatial field (top-down) C=Cantus F=Florid S=Shadow" Panel 5: AGENT SPATIAL PROFILES • For each agent: profile, azimuth range, travel distance, mean distance, steps, unique events • Color-coded by agent color Panel 6: COUNTERPOINT & SEPARATION • Spatial separation between agent pairs (in degrees) • Unison rates (how often agents select same event) Panel 7: SUMMARY PANEL • Format, channels, distance model, reverb • Events, unique used, mean event duration • Autoencoder loss, latent size, seed • Duration in/out

Reading the Spatial Field Diagram

What the diagram shows:
  • Gray squares: Speaker positions — labeled according to format (L, R, FL, FR, etc.)
  • Colored circles: Agents — position indicates their current location in space (averaged over the whole trajectory)
  • Letters inside circles: C = Cantus, F = Florid, S = Shadow — identifies agent profile
  • Gray circles: Reference distances at radius 0.5 and 1.0 — agents near center are closer (louder, brighter), near edge are farther (quieter, more filtered)
  • Color consistency: Each agent has a consistent color across all visualizations

Interpreting Spatial Separation

What the numbers mean:
  • Spatial separation (degrees): Average angular distance between two agents over time — higher means they stay apart spatially
  • Unison rate: How often agents choose the same event — low unison means they're exploring different material
  • Azimuth range: Total angular span covered by an agent — wide range = wandering, narrow = stable
  • Azimuth travel: Total angular distance traveled — high = active, low = stationary

Applications

Electroacoustic Composition

Use case: Creating spatial compositions where timbral relationships become spatial

Technique: Quad Trio or Octophonic Scatter presets on varied source material

Workflow:

Sound Design for Immersive Media

Use case: Creating spatial textures for VR, 360 video, or multi-channel installations

Technique: Immersive Drift or Surround Ensemble on appropriate sources

Applications:

Music Production

Use case: Creating spatialized arrangements from stem material

Technique: Close-Mic Trio for clarity, or Quad Trio for immersive

Examples:

Research & Education

Use case: Studying spatial perception, agent-based models, VBAP

Technique: Compare presets on same source, examine agent trajectories

Learning outcomes:

Practical Workflow Examples

🎬 Film Scene: Spatial Tension

Goal: Create 60-second spatial tension cue for 5.1 surround

Settings:

  • Source: 30-second drone with subtle variations
  • Preset: Surround Ensemble
  • Custom: rigidity=0.8 (strong repulsion), reverb=0.6

Result: Four agents repelling in timbre → spatial separation around the room, creating tension

🎚️ Electronic Music: Evolving Pads

Goal: Create evolving pad texture in quadraphonic

Settings:

  • Source: 8-second synth pad
  • Preset: Quad Trio
  • Custom: speed=0.3 (slow), distance=Amp+LP (filtered)

Result: 24-second evolving pad with agents slowly drifting through quad space

🎙️ Voice Spatialization

Goal: Spatialize solo voice across octophonic array

Settings:

  • Source: 10-second vocal phrase
  • Preset: Octophonic Scatter
  • Custom: rigidity=0.4 (gentle repulsion), reverb=0.5

Result: Voice fragmented across 8 speakers, creating immersive choral effect

Troubleshooting Common Issues

Problem: Python not found or missing packages
Cause: Python not installed, or packages missing
Solution: Install Python and required packages: pip install numpy soundfile scipy
Problem: Output has clicks
Cause: Crossfade insufficient at splice points
Solution: Increase XFADE_SEC in Python script (currently 8ms)
Problem: Agents not moving spatially
Cause: Low speed, or latent space too constrained
Solution: Increase speed, reduce rigidity, check agent profiles
Problem: Spatial separation stats low
Cause: Agents attracted to same events, or repulsion too weak
Solution: Increase counterpoint_rigidity, check agent profiles
Problem: Multi-channel output not playing correctly
Cause: Playback system not configured for multi-channel
Solution: Use appropriate software/hardware, or downmix to stereo

Advanced Techniques

Custom speaker layouts:

In Python script, modify SPEAKER_LAYOUTS and SPEAKER_LABELS to implement custom speaker configurations (e.g., 7.1, ambisonic, irregular arrays).

Agent profile tuning:

Modify mass, speed, attraction weights in Agent.__init__() to create new behavioral archetypes.

Distance processing customization:

Adjust distance_attenuation, distance_lowpass, and distance_reverb functions to implement different spatial models.

Multi-channel monitoring:

For true multi-channel monitoring, use a DAW with multi-channel output routing, or specialized software like Reaper, Logic Pro, or Max/MSP.