KD-Tree Timbral Counterpoint — Neural Counterpoint Engine
Creates N contrapuntal layers from a target sound by searching a corpus for grains at different timbral distances using a KD-Tree for efficient N-dimensional nearest-neighbour lookup across MFCC, pitch, centroid, intensity, HNR, and ZCR features.
What this does
This script implements KD-Tree Timbral Counterpoint — a computational approach to generating polyphonic textures by searching a corpus of audio grains for matches at different timbral distances. A target sound is divided into overlapping grains, each described by 11 acoustic features (6 MFCCs, pitch, spectral centroid, intensity, HNR, ZCR). A KD-Tree provides efficient N-dimensional nearest-neighbour lookup across a corpus of WAV files. Each voice is assigned a different neighbour rank, producing layers ranging from close imitation to distant timbral echo.
Key Features:
- 11-Dimensional Feature Space — 6 MFCCs + pitch + spectral centroid + intensity + HNR + ZCR
- KD-Tree Nearest Neighbour — Efficient lookup across entire corpus (supports thousands of grains)
- 4 Contrapuntal Voices — Each with distinct neighbour rank, pan position, and delay
- 5 Presets — Strict Doppelgänger, Spectral Counterpoint, Ghost Choir, Orchestral Shadow, Noise Doppelgänger
- Adjustable Weights — Control importance of MFCC, pitch, centroid, intensity, HNR in distance calculation
- Repetition Penalty — Prevents repeating the same corpus grain within recent history
- Envelope Shaping — Optionally follow target's pauses and/or amplitude envelope
Quick start
- In Praat, select exactly one Sound object (the target).
- Prepare a corpus folder containing .wav files (any length, any sample rate).
- Run script… →
KDTreeTimbralCounterpoint.praat. - Enter Corpus folder path (use forward slashes).
- Set Grain size (ms) and Overlap (%) — smaller grains = more granular detail.
- Choose a preset or configure custom weights and neighbour ranks.
- Adjust Envelope shaping (Off / Pauses only / Amplitude envelope only / Both).
- Click OK — script extracts features, runs KD-Tree matching, and resynthesises polyphony.
pip install numpy scipy (scipy for cKDTree). Corpus folder must contain at least one .wav file. Grain size should not exceed the shortest corpus sound. Processing time depends on corpus size — a few hundred grains takes seconds; thousands may take minutes. Envelope shaping (pauses) uses intensity thresholding (-35 dB) — adjust if target has low-level noise.
4 Contrapuntal Voices — Built-in Separation Rules
Voice 1 — Close
Neighbour rank: 1 (closest match)
Pan: Centre (0.0)
Delay: 0 ms
Gain: 0.9
Character: Almost identical to target — close imitation, like a canon at the unison.
Voice 2 — Shadow
Neighbour rank: 3
Pan: Alternating ±0.5
Delay: 20 ms
Gain: 0.65
Character: Slightly different timbre, stereo separation, short echo.
Voice 3 — Cousin
Neighbour rank: 8
Pan: Alternating ±0.8
Delay: 45 ms
Gain: 0.45
Character: Distantly related timbre, wide stereo, noticeable delay.
Voice 4 — Ghost
Neighbour rank: 20
Pan: Random ±1.0
Delay: 80 ms
Gain: 0.30
Character: Very different timbre, unpredictable panning, long echo — like a ghost of the original.
Presets
| Preset | Ranks | Weights (MFCC/Pitch/Cent/Int/HNR) | Randomness | Character |
|---|---|---|---|---|
| Strict Doppelgänger | 1,2,3,4 | MFCC=1.5, Int=1.0, others=0.8 | 0.05 | Very close imitation — almost identical layers, subtle stereo spread. |
| Spectral Counterpoint | 1,3,8,20 | Balanced (1.0,0.8,0.5,0.5,0.5) | 0.2 | Standard preset — clear hierarchy from close to distant. |
| Ghost Choir | 5,12,25,50 | Balanced | 0.5 | All voices distant — ethereal, choir-like texture with unpredictable timbres. |
| Orchestral Shadow | 1,3,8,20 | Pitch=1.5, Centroid=1.2, others=1.0 | 0.2 | Emphasises pitch and spectral shape — good for melodic material. |
| Noise Doppelgänger | 1,3,8,20 | HNR=2.0, Centroid=1.5, others=1.0 | 0.2 | Emphasises noise content — transforms pitched sounds into noisy textures. |
| Custom | user-defined | user-defined | user-defined | Full manual control over all parameters. |
distance = sqrt( Σ w_i × ((f_i - μ_i)/σ_i)² )
Where:
- f_i = feature value (MFCC1..6, centroid, pitch, intensity, HNR, ZCR)
- μ_i, σ_i = mean and standard deviation computed from all corpus grains
- w_i = user weight per feature group (MFCCs share the same weight)
Feature Space — 11 Dimensions
| Feature | Description | Weight group |
|---|---|---|
| MFCC 1–6 | Mel-frequency cepstral coefficients — spectral envelope shape (timbre). | mfcc_weight (default 1.0) |
| Pitch (F0) | Fundamental frequency in Hz (0 for unvoiced). | pitch_weight (default 0.8) |
| Spectral centroid | Centre of gravity of the spectrum (brightness). | spectral_centroid_weight (default 0.5) |
| Intensity | RMS energy in dB. | intensity_weight (default 0.5) |
| HNR | Harmonics-to-Noise Ratio — periodicity vs. noise content. | hnr_weight (default 0.5) |
| ZCR | Zero-crossing rate — correlates with noisiness/spectral slope. | hnr_weight (same as HNR) |
KD-Tree indexing
The script builds a scipy.spatial.cKDTree from all corpus grains after standardization and weighting. Query time is O(log N) — efficient even for large corpora (tens of thousands of grains). For each target grain and each voice, the tree is queried for k = (rank + 30) neighbours, then the rank-th neighbour is selected (with randomness and repetition penalty).
Visualization (Praat Picture)
When Draw_visualization is enabled, the script generates a comprehensive 8×8 cm picture with six panels:
| Panel | Content |
|---|---|
| Title bar | Script name, sound name, voice count, ranks, preset name. |
| Target waveform | Target sound with vertical lines at each grain boundary. |
| Output mix waveform | Final stereo mix after all voices, trimming, and fade-out. |
| Target spectrogram | 0–5000 Hz, Gaussian window — left side of canvas. |
| Output spectrogram | Right side of canvas — compare spectral transformation. |
| Contrapuntal voice timeline | Colour-coded horizontal bars for each voice (V1–V4). Each bar = one corpus grain placed in time. Gaps = silence between grains (preserves target rhythm). |
| Summary panel | Voices, ranks, randomness, grain parameters, file counts, durations, weights, envelope shaping mode. |
Applications
Generative Polyphonic Texture
Use case: Transform a monophonic recording into a rich, evolving polyphonic texture with 4 distinct timbral layers.
Technique: Use Spectral Counterpoint preset. Target can be voice, instrument, or field recording. Corpus should contain sounds with complementary timbres (e.g., same instrument different articulations, or cross-synthesis between voice and synthesisers).
Timbral Canon / Echo
Use case: Create a canon where each voice echoes the target's structure but with different timbres, like a "sound canon".
Technique: Strict Doppelgänger with ranks 1,2,3,4, short grain size (50–100 ms), low randomness. The result: four nearly identical layers with slight timbral variations, panned across stereo field.
Corpus Exploration & Composition
Use case: Discover unexpected relationships within a corpus. Use a silent target (or constant tone) to "scan" the corpus at different ranks.
Technique: Target a simple sine wave or white noise. The result is a composition where grains are selected based on timbral proximity to the target — revealing the corpus's internal structure.
Experimental Music & Sound Art
Use case: Generate impossible textures — sounds that are simultaneously identical in rhythm but wildly different in timbre.
Technique: Ghost Choir preset with large grain size (500–1000 ms), high randomness (0.5–0.8), envelope shaping = "Pauses only" to preserve target rhythm while letting timbres drift.
Workflow: Voice → Percussion Ghost Choir
Target: Spoken phrase (3–5 seconds).
Corpus: Folder of drum loops, cymbals, foley sounds.
Settings: Ghost Choir preset, grain=150 ms, overlap=50%, envelope shaping = "Pauses only".
Result: The spoken rhythm is preserved, but each syllable triggers drum/percussion grains at different timbral distances — a percussive "ghost choir".
Workflow: Piano → Orchestral Shadow
Target: Piano melody.
Corpus: Orchestral samples (strings, woodwinds, brass, pizzicato).
Settings: Orchestral Shadow preset (pitch_weight=1.5, centroid_weight=1.2).
Result: Each piano note is echoed by orchestral grains with similar pitch and brightness — a spectral canon where piano becomes the conductor of an imaginary orchestra.
• Scipy not installed: Run
pip install scipy. The script falls back to brute-force search (O(N²)) but warns about performance.• No .wav files in corpus: Ensure corpus folder contains WAV files (mono or stereo, any sample rate — script resamples to 44.1 kHz).
• Grain size too large: If grain size exceeds the shortest corpus sound, extraction fails. Use smaller grains (e.g., 100–300 ms).
• Output silence: Check that corpus grains have reasonable amplitude. Increase
final_gain in voice rules (edit script) or use envelope shaping = "Amplitude envelope only" to boost soft grains.• KD-Tree query slow: Reduce corpus size or increase grain size (fewer grains). For huge corpora (10k+ grains), consider subsetting.
• Envelope shaping ignores low-level content: Adjust silence threshold in the
To TextGrid (silences) command (currently -35 dB) — lower for quiet sources.
Advanced: Custom Voice Separation
kd_tree_timbral_counterpoint.py:
if v_idx == 0: # Voice 1
gain, pan, delay = 0.9, 0.0, 0.0
elif v_idx == 1: # Voice 2
gain, pan, delay = 0.65, pan_val, 20.0
elif v_idx == 2: # Voice 3
gain, pan, delay = 0.45, pan_val, 45.0
else: # Voice 4+
gain, pan, delay = 0.30, random.uniform(-1.0, 1.0), 80.0