The Latent Counterpoint — User Guide

Trains an autoencoder on-the-fly to learn a latent space from event-level audio patches, then deploys multiple agents that navigate the latent space simultaneously with counterpoint forces (attraction, repulsion, inertia, jitter) to produce polyphonic recombination of the input material.

Author: Shai Cohen Affiliation: Department of Music, Bar-Ilan University, Israel Version: 1.0 (2025) License: MIT License Citation: Cohen, S. (2025). Praat AudioTools Repo: https://github.com/ShaiCohen-ops/Praat-plugin_AudioTools

Contents:

What this does Quick start Latent Counterpoint Theory Preset Strategies Parameters & Controls Visualization & Analysis Applications

What this does

This script implements The Latent Counterpoint — an AI-powered multi-agent system that learns a latent space from event-level audio patches using an on-the-fly autoencoder, then deploys multiple agents that navigate this space simultaneously with counterpoint forces (attraction, repulsion, inertia, jitter) to produce polyphonic recombination of the input material.

🎵 What is Latent Counterpoint?

Traditional counterpoint is the art of combining independent melodic lines. This system creates a latent counterpoint:

Events are segmented from the source audio (200ms–3s)
Autoencoder learns a latent space where each event becomes a point
Agents (Cantus, Florid, Shadow) navigate this space with distinct behaviors
Forces (attraction to events, repulsion from other agents) create emergent polyphony
The result is a new composition where multiple "voices" independently recombine the source material

Key Features:

6 Preset Strategies — Duo to Free Scatter, plus Custom
2-6 Agents — Each with distinct behavioral profile (Cantus, Florid, Shadow)
On-the-Fly Autoencoder — Pure numpy MLP with Adam, trained on log-mel patches
Physics Engine — Attraction, repulsion, inertia, jitter, LRU memory
3 Agent Profiles — Cantus (stable, central), Florid (peripheral, fast), Shadow (lagging mirror)
Counterpoint Rigidity — Controls how strongly agents repel each other
Speed Control — Global agent movement speed
Comprehensive Visualization — 5-panel display with waveforms, spectrograms, agent profiles, unison stats

Technical Implementation: (1) Event Segmentation: Praat segments audio into 200ms–3s events. (2) Mel Patches: Python extracts 40×32 log-mel patches per event. (3) Autoencoder: Train MLP with hidden layer, leaky ReLU, denoising, L2 reg, Adam. (4) Latent Geometry: Compute center, periphery, distance matrix. (5) Agent Physics: Multi-agent simulation with forces, inertia, jitter. (6) Polyphonic Reconstruction: Sum agent outputs with panning and volume scaling. (7) Visualization & Stats.

Quick start

In Praat, select exactly one Sound object (any duration, any content).
Run script… → select LatentCounterpoint.praat.
Choose Preset (2-7 for specific strategies, 1 for custom).
Set number of agents, latent size, counterpoint rigidity, speed.
Set target duration (0 = original).
Enable Draw_visualization for analysis display.
Click OK — engine segments, trains autoencoder, runs agent physics, reconstructs.

Quick tip: Start with Trio preset on a 10-20 second recording with varied texture. Enable visualization — you'll see event boundaries (red lines) on the input waveform, agent profiles with stats, and unison rates. Listen to how the three voices create independent yet related lines, each navigating the latent space differently. The output appears as "source_cp" in the Objects window.

Important: PYTHON DEPENDENCIES — Requires numpy, soundfile, scipy (no scikit-learn). AUTOENCODER TRAINING happens on-the-fly and may take 30-60 seconds. EVENT SEGMENTATION uses intensity peaks — if your material has few dynamic changes, consider a different source. LATENT SIZE affects representation — too small may lose detail, too large may overfit. COUNTERPOINT RIGIDITY controls agent repulsion — higher = more independent voices.

Latent Counterpoint Theory

Agent Profiles

🎭 Three Behavioral Archetypes

Profile	Role	Mass	Max Speed	Jitter	Attraction	Behavior
Cantus	Stable leader	3.0	0.3	0.05	1.0	Gravitates to center of gravity, slow, stable
Florid	Peripheral wanderer	0.5	1.5	0.2	0.6	Attracted to rare/peripheral sounds, fast, exploratory
Shadow	Lagging mirror	2.0	0.4	0.08	0.3	Mirrors Cantus with 3-step lag + inversion, moderate

Physics Engine

⚙️ Agent Dynamics

At each time step, each agent experiences:

Attraction force: • Cantus: to center + mild pull to nearest event • Florid: to peripheral events (weighted by periphery score) • Shadow: to mirrored position of lagged Cantus + mild event pull Repulsion force: F_rep = rigidity × median_dist² / d² (capped at speed×3) Jitter: deterministic pseudo-random noise scaled by profile Velocity update: a = F / mass v = 0.85·v + a speed = min(||v||, max_speed × speed_global × median_dist)

LRU memory: Each agent remembers last 5 chosen events and applies distance penalty to avoid repetition.

Latent Geometry

For latent vectors Z[1..n_events]: center = mean(Z) periphery[i] = ||Z[i] - center|| / max(||Z - center||) dists[i,j] = ||Z[i] - Z[j]|| median_dist = median(dists where i≠j) These provide the spatial reference for agent movement.

Polyphonic Reconstruction

Each agent produces a mono sequence by concatenating its chosen events: layer_i = concatenate(clips[history_i]) with crossfades Panning and volume by profile: • Cantus: vol=0.6, pan=0.5 (center) • Florid: vol=0.4, pan=0.75 (right-of-center) • Shadow: vol=0.35, pan=0.25 (left-of-center) Output = Σ layer_i × gain_i × pan_i (stereo)

Unison Rate

For agents i and j at step s: unison(s) = 1 if history_i[s] == history_j[s] else 0 unison_rate = Σ unison(s) / total_steps Lower unison rates indicate more independent voices — true counterpoint.

Preset Strategies

Preset 2: Duo (2 voices)

🎼 Two-Voice Counterpoint

Agents: 2 | Latent: 6 | Rigidity: 0.4 | Speed: 0.4

Character: Gentle duo — one Cantus, one Florid, moderate independence

Use on: Simple material, exploring two-voice counterpoint

Preset 3: Trio (3 voices)

🎵 Three-Voice Counterpoint

Agents: 3 | Latent: 8 | Rigidity: 0.5 | Speed: 0.5

Character: Balanced trio — Cantus, Florid, Shadow — full archetype set

Use on: General purpose, exploring three-voice interactions

Preset 4: Quartet (4 voices)

🎻 Four-Voice Ensemble

Agents: 4 | Latent: 10 | Rigidity: 0.6 | Speed: 0.5

Character: Denser texture — profiles cycle: Cantus, Florid, Shadow, Florid

Use on: Richer material, complex counterpoint

Preset 5: Dense Ensemble (5 voices)

🌫️ Five-Voice Texture

Agents: 5 | Latent: 12 | Rigidity: 0.7 | Speed: 0.6

Character: Dense polyphonic texture with strong repulsion

Use on: Complex material, dense textures

Preset 6: Tight Counterpoint

🔗 Highly Independent

Agents: 3 | Latent: 8 | Rigidity: 1.5 | Speed: 0.3

Character: Very strong repulsion — voices stay far apart in latent space, minimal unison

Use on: Maximizing voice independence

Preset 7: Free Scatter

🌀 Loose, Scattered

Agents: 3 | Latent: 10 | Rigidity: 0.1 | Speed: 1.2

Character: Very weak repulsion, high speed — agents wander freely, overlapping often

Use on: Loose, overlapping textures

Parameters & Controls

Agent Parameters

Parameter	Default	Description
Number_of_agents	3	2-6 agents (profiles cycle through Cantus/Florid/Shadow)
Latent_size	8	Autoencoder latent dimensions (2–32)
Counterpoint_rigidity	0.5	Repulsion strength between agents (0–2)
Speed	0.5	Global agent movement speed (0.1–3)

Duration

Parameter	Default	Description
Duration (0 = original)	0	Target output duration (seconds)

Output

Parameter	Default	Description
Seed	42	Random seed for reproducibility
Draw_visualization	1	Generate 5-panel analysis display
Play_result	1	Audition after processing

Visualization & Analysis

5-Panel Display

The Latent Counterpoint Visualization: Panel 1: TITLE • Script name, source name, preset, agent count, rigidity Panel 2: INPUT WAVEFORM • Gray waveform with red dotted lines = event boundaries • Title: "Original (N events)" Panel 3: OUTPUT WAVEFORM • Purple waveform = stereo counterpoint output • Title: "Counterpoint" • X-axis: Time (s) Panel 4: ORIGINAL SPECTROGRAM • 0-5000 Hz spectrogram of original • Title: "Original spectrogram" Panel 5: OUTPUT SPECTROGRAM • 0-5000 Hz spectrogram of counterpoint (L channel) • Title: "Counterpoint spectrogram (L channel)" Panel 6: AGENT PROFILES PANEL • For each agent, color-coded by agent ID: - Agent 0 (blue), Agent 1 (red), Agent 2 (green), Agent 3 (purple), etc. • Profile name (Cantus/Florid/Shadow) • Steps, Unique events, Repetition rate • Average travel distance in latent space • Title: "Agent Profiles:" Panel 7: COUNTERPOINT PANEL • Unison rates for each agent pair (lower = more independent) • Title: "Counterpoint (unison rates — lower = more independent):" Panel 8: SUMMARY PANEL • Events count, total unique used, mean event duration • Autoencoder loss (initial → final), latent size, seed • Duration in/out, RMS comparison • Warnings if any

Reading Agent Profiles

What the agent stats mean:

Steps: Number of events selected by this agent (should be similar across agents)
Unique: How many distinct source events the agent used — lower = more repetition
Rep rate: (steps - unique)/steps — higher = more repetition
Travel: Average latent distance between consecutive chosen events — higher = more exploratory
Periphery: Average distance from latent center of chosen events — higher = more peripheral

Interpreting Unison Rates

What the numbers mean:

Unison rate is the percentage of time two agents select the same event
High unison (>30%): Voices are highly correlated — may sound like a single line
Medium unison (10-30%): Some independence, occasional agreements
Low unison (<10%): Highly independent voices — true counterpoint
Adjust counterpoint_rigidity to control unison rates

Applications

Electroacoustic Composition

Use case: Creating polyphonic textures from single-source material

Technique: Trio or Quartet presets on varied source material

Workflow:

Select a 20-60 second recording with diverse timbres
Run with Quartet preset (4 agents)
Examine agent profiles and unison rates
Export and use as polyphonic movement in larger work

Generative Music

Use case: Creating endless variations with different seeds

Technique: Same preset, different seed values

Examples:

Background textures: Free Scatter with low rigidity — loose, overlapping
Canon-like structures: Tight Counterpoint with high rigidity — independent voices
Evolving polyphony: Trio with moderate settings

Sound Design for Media

Use case: Creating layered, evolving textures

Technique: Dense Ensemble on appropriate sources

Applications:

Horror: High rigidity creates dissonant independence between voices
Ambient: Low rigidity, slow speed creates gentle overlapping
Tension: Medium settings with moderate unison rates

Research & Education

Use case: Studying emergent polyphony, agent-based models

Technique: Compare presets on same source, examine agent trajectories

Learning outcomes:

Understand how repulsion creates independence between voices
See how different agent profiles produce distinct behaviors
Observe relationship between latent space and event selection
Explore autoencoder learning and latent space geometry

Practical Workflow Examples

🎬 Film Scene: Multiple Personalities

Goal: Create 60-second cue representing multiple characters from a single voice

Settings:

Source: 30-second vocal improvisation
Preset: Quartet
Custom: rigidity=0.8 (strong independence), speed=0.4

Result: Four independent vocal streams, each with distinct character (Cantus, Florid, Shadow, Florid)

🎚️ Electronic Music: Polyphonic Pad

Goal: Create evolving pad with multiple layers

Settings:

Source: 8-second synth pad
Preset: Trio
Custom: speed=0.3 (slow), rigidity=0.3 (loose)

Result: 24-second evolving pad with three overlapping voices

🎙️ Voice Processing: Choral Effect

Goal: Create choral texture from solo voice

Settings:

Source: 10-second vocal phrase
Preset: Dense Ensemble (5 voices)
Custom: rigidity=0.5, speed=0.5

Result: Five-voice polyphony from single voice — creates choral effect

Troubleshooting Common Issues

Problem: Python not found or missing packages
Cause: Python not installed, or packages missing
Solution: Install Python and required packages: pip install numpy soundfile scipy

Problem: Too few events detected
Cause: Source has few intensity peaks, or segmentation parameters inappropriate
Solution: Use source with more dynamic variation, or adjust min/max event duration in script

Problem: Autoencoder loss not decreasing
Cause: Too few steps, too small latent size, or data too complex
Solution: Increase learning_steps, increase latent_size, or use simpler source

Problem: All agents sound similar (high unison rate)
Cause: counterpoint_rigidity too low, or speed too high causing convergence
Solution: Increase rigidity, reduce speed, check agent profiles

Problem: Output has clicks
Cause: Crossfade insufficient at splice points
Solution: Increase XFADE_SEC in Python script (currently 8ms)

Advanced Techniques

Custom agent profiles:

In Python script, modify mass, max_speed, jitter_scale, and attraction_weight in Agent.__init__() to create new behavioral types.

Memory size adjustment:

Change a.memory_size (default 5) to control how strongly agents avoid recent events.

LRU penalty tuning:

Modify penalty scaling (currently median_dist * 0.3) to control repetition avoidance strength.

Multi-channel output:

Script outputs stereo. For multi-channel, modify reconstruct_polyphonic() to output N channels with custom panning.