KL Divergence Corpus Resynthesis — Information-Theoretic Mosaicking
Corpus A defines a target timbral distribution. Corpus B provides source grains. The script greedily builds an output by selecting grains that minimise KL divergence between the reference distribution and the growing output distribution — actively driving synthesis, not just analysing afterwards.
What this does
This script implements information-theoretic corpus mosaicking. Two audio corpora are analysed: Corpus A (REFERENCE) defines a target timbral distribution; Corpus B (SOURCE) provides the building blocks (grains). The script greedily builds an output sound by repeatedly choosing the Corpus B grain that, when added, minimises the KL divergence between the reference distribution P and the running output distribution Q. KL actively drives synthesis — it is not computed after the fact.
Key Features:
- 5 Presets — Coarse/fast, Balanced, Fine match, Dense mosaic, Custom
- Feature space — MFCC c1..c13 + RMS (dB) + spectral centroid + spectral spread (16 dims total)
- Distribution model — per-dimension normalised histograms (nBins = 16–48, user-settable)
- KL modes — Forward D(P||Q), Reverse D(Q||P), Symmetric (Jensen-Shannon style)
- Greedy selection — per step, evaluates a candidate pool (default 40 grains) via fast vectorised KL
- Overlap-add resynthesis — grains placed at hop intervals, crossfade between adjacent segments
- Visualisation — histogram comparison showing reference P vs. achieved Q for a chosen feature dimension
Quick start
- Prepare two folders: Corpus A (reference) and Corpus B (source) containing audio files (WAV, AIFF, MP3).
- In Praat, run
KL_Divergence_Corpus_Resynthesis.praat. - Enter folder paths (or leave blank for chooser dialogs).
- Choose a preset (Coarse/fast, Balanced, Fine match, Dense mosaic, or Custom).
- Adjust analysis parameters: Frame length (default 46.4 ms), Overlap ratio (1:1 to 1:8), Number of MFCCs, Histogram bins.
- Select KL mode (Forward / Reverse / Symmetric).
- Set Output_duration (seconds) and Crossfade (overlap between grains).
- Click OK — script analyses both corpora, builds reference distribution, runs greedy selection, synthesises output.
target_sample_rate (default 44.1 kHz) and normalises peak to 0.99 before analysis.
5 Presets
| Preset | Histogram Bins | Candidate Pool | KL Mode | Overlap | Character |
|---|---|---|---|---|---|
| Coarse / fast | 16 | 20 | Forward (P||Q) | 1:1 (no overlap) | Fastest, coarser distribution matching, minimal smoothing.适合 |
| Balanced | 32 | 40 | Symmetric | 1:2 (50% overlap) | Good quality/time trade-off; recommended default. |
| Fine match | 48 | 80 | Symmetric | 1:2 (50% overlap)九年Higher precision, slower, more accurate distribution matching. | |
| Dense mosaic | 32 | 60 | Symmetric | 1:4 (75% overlap)九年Denser grain placement, smoother temporal evolution. |
KL Divergence & Greedy Selection
Kullback-Leibler divergence
Forward KL: DKL(P || Q) = Σx P(x) · log(P(x) / Q(x))
Measures how much information is lost when Q approximates P. Tends to be mode-seeking (pulls Q toward high-probability regions of P).
Reverse KL: DKL(Q || P) = Σx Q(x) · log(Q(x) / P(x))
Tends to be mass-covering (spreads Q to cover P's support).
Symmetric KL: (DKL(P||Q) + DKL(Q||P)) / 2
Balances both behaviours; often the best choice for timbral matching.
- Compute reference distribution P from Corpus A (per-dimension histograms, eps-smoothed).
- Initialise output distribution Q as zero counts.
- For each output step (duration / hop_size):
- Precompute base KL (without adding any grain) and per-bin corrections.
- Sample candidate pool (size = candidate_pool) from Corpus B grains.
- For each candidate, quickly compute KL = (1/featDim) × Σd (base_d + corr_d[addBin])
- Select grain with lowest KL, commit it: update Q counts, append grain to output.
- Resynthesise output by overlap-add of selected grains (crossfade = crossfade seconds).
Feature Space (16 dimensions)
MFCC c1–c13 (13 dims)
Mel-frequency cepstral coefficients: spectral envelope shape (timbre). c1 = overall spectral tilt, c2–c13 = finer formant structure.
RMS (dB) — 1 dim
Root-mean-square amplitude converted to decibels. Captures loudness/dynamics.
Spectral centroid — 1 dim
Centre of gravity of the spectrum (brightness). Higher = brighter timbre.
Spectral spread — 1 dim
Standard deviation of the spectrum around the centroid (bandwidth). Captures noisiness/roughness.
Applications
Timbral transfer / style emulation
Use case: Corpus A = recordings of a specific instrument (e.g., cello), Corpus B = a different instrument (e.g., voice). The output resynthesises the voice such that its timbral distribution matches the cello — a form of timbre transfer without machine learning.
Settings: Balanced preset, Symmetric KL. Output duration = length of desired output.
Corpus summarisation / interpolation
Use case: Corpus A = long recording, Corpus B = short fragments from the same recording. The output selects grains that approximate the long recording's distribution — effectively a "timbre summary".
Settings: Fine match preset, output duration shorter than original.
Experimental / generative music
Use case: Corpus A = desired timbral "target" (e.g., a specific synthetic texture), Corpus B = a large database of found sounds. The output is a mosaic that sounds like it's made from Corpus B but with the distribution of Corpus A.
Settings: Custom: high histogram bins (48), large candidate pool (80), forward KL for mode-seeking (extreme timbral focus).
Workflow: Voice → cello timbre transfer
Corpus A (reference): 5 minutes of solo cello (sustained tones, pizzicato, arco).
Corpus B (source): 2 minutes of spoken voice (vowels, consonants, breaths).
Settings: Balanced preset, Symmetric KL, output_duration = 30 s.
Result: The output sounds like voice fragments but with the timbral distribution of cello — vocal formants shaped like cello spectral envelopes.
Workflow: Drum loop → ambient texture
Corpus A (reference): 10-minute ambient drone (sustained, smooth).
Corpus B (source): 30-second drum loop (transient-rich, noisy).
Settings: Forward KL (mode-seeking), Coarse preset, output_duration = 60 s.
Result: The output selects drum hits whose spectral features match the drone's distribution — likely quiet, sustained drum sounds (e.g., cymbal rolls, tom sustains) rather than attacks.
Workflow: Corpus interpolation (morphing)
Corpus A: Bright, high-centroid sounds (e.g., piccolo, bells).
Corpus B: Dark, low-centroid sounds (e.g., bass clarinet, cello).
Settings: Symmetric KL, Balanced preset. Morph by mixing reference histograms: to interpolate, create a custom target distribution: 0.5×P_A + 0.5×P_B (requires manual histogram combination).
Result: Output sounds like a timbral cross-fade between the two extremes.
• Output sounds static / repetitive: If Corpus B has few distinct grains (small folder or short files), the algorithm will reuse grains. Increase Max_files_per_corpus or use larger Corpus B.
• KL barely decreases: If P and Q are already similar from the start, or if Corpus B cannot match P's distribution (e.g., Corpus B has no high-centroid sounds but P has high centroid mass). Check the visualisation histogram.
• Output clicks at grain boundaries: Increase Crossfade (e.g., 0.02–0.05 s). Ensure overlap ratio > 1:1.
• Processing is slow: Reduce candidate_pool (20–30), reduce histogram_bins (16–24), or use Coarse preset. Also reduce nBins for faster histogram updates.
• Corpus B has many files, but script uses only a few: Check Max_files_per_corpus — 0 = no limit. Also ensure files are longer than frame_length + hop_size (short files are skipped).
Visualisation: P vs. Q histograms
- Title bar — preset name and displayed feature dimension
- Histogram panel — light blue bars = reference distribution P (Corpus A), red bars = achieved distribution Q (output mosaic).
- Legend — KL mode, bin count, initial and final KL values.
frame_length - crossfade. If crossfade is large (e.g., frame_length/2), grains overlap significantly, producing smooth transitions. Overlap ratio presets (1:2, 1:4, 1:8) control how many grains overlap — higher overlap = smoother, more blurring.