Corpus Mosaic — v1.2 User Guide
Offline corpus‑based mosaic synthesis. Reconstructs a target Sound object by splicing together tiny grains of audio from a folder of corpus sounds. Matching is performed via a Python engine using a 6‑dimensional feature vector (Loudness, Centroid, Flatness, Rolloff, ZCR, Pitch). Includes repetition penalties, continuity bonuses, and a new silence gate to skip low‑energy target grains.
What this does
Corpus Mosaic is an offline concatenative synthesizer. It takes a target sound and a folder of corpus sounds, slices both into grains of equal duration (with overlap), extracts a 6‑dimensional feature vector for each grain, and then rebuilds the target by selecting the best‑matching corpus grain for every target grain. The selection process can be influenced by repetition penalties (to avoid reusing the same grain too often), continuity bonuses (to favour consecutive grains from the same source file), and a randomness factor.
Quick start
- In Praat, select exactly one Sound object (the target).
- Run script… →
CorpusMosaic.praat. - In the directory chooser, select a folder containing corpus sounds (WAV, FLAC, AIFF). The script will scan recursively.
- In the form, set:
- Grain_size_ms – duration of each grain (e.g., 100 ms).
- Overlap_percent – overlap between consecutive grains (0–99 %).
- Pitch_weight, Timbre_weight, Loudness_weight – relative importance of each feature group.
- Top_k_candidates – number of best‑matching grains to consider.
- Randomness – 0 = always choose the best match; 1 = uniform random among top‑k.
- Continuity_preference – bonus for choosing the next grain from the same source file.
- Repetition_penalty – penalty for reusing a grain recently.
- Gate_threshold_dB – silence gate level (e.g., ‑40 dB). Grains below this are not rendered.
- Click OK. Python scans the corpus, extracts features, computes distances, and renders the mosaic as
originalname_mosaic.
numpy, scipy, soundfile, librosa. The engine uses librosa for feature extraction (YIN pitch, spectral features). For large corpora, feature extraction may take a minute or two.
Pipeline — five stages
Stage 2 – Corpus scanning – recursively scan corpus folder, load each file, slice into grains, extract features, store metadata (file path, start sample).
Stage 3 – Normalisation & weighting – robust z‑score normalisation across all corpus grains. Apply user‑supplied weights to the squared‑distance calculation.
Stage 4 – Matching & rendering – for each target grain, compute distances to all corpus grains, apply penalties/bonuses, select grain (deterministic or stochastic), place into output buffer with Hann window.
Stage 5 – Silence gate – if target grain RMS < threshold, skip rendering (inserts a placeholder in CSV but no audio). This speeds up processing and prevents noisy matches for silent regions.
The 6‑dimensional feature vector
| Feature | Description | Associated weight |
|---|---|---|
| RMS | Root‑mean‑square energy (loudness) | Loudness_weight |
| Centroid | Spectral centroid (brightness) | Timbre_weight |
| Flatness | Spectral flatness (tonal vs. noisy) | Timbre_weight |
| Rolloff | Frequency below which 85 % of energy lies | Timbre_weight |
| ZCR | Zero‑crossing rate (noisiness) | Timbre_weight |
| Pitch (F0) | Fundamental frequency from YIN algorithm (0 = unvoiced) | Pitch_weight |
Features are normalised using robust z‑score (mean and standard deviation) across the entire corpus. The weighted squared Euclidean distance is then computed between each target grain and all corpus grains.
Parameters & defaults
Grain & overlap
| Parameter | Range | Default | Description |
|---|---|---|---|
| Grain_size_ms | ≥10 ms | 100 ms | Duration of each analysis/ synthesis grain. |
| Overlap_percent | 0–99 % | 50 % | Overlap between consecutive grains (hop = grain × (1‑overlap)). |
Feature weights
| Parameter | Range | Default | Description |
|---|---|---|---|
| Pitch_weight | ≥0 | 1.0 | Weight for F0 feature. |
| Timbre_weight | ≥0 | 1.0 | Weight for centroid, flatness, rolloff, ZCR. |
| Loudness_weight | ≥0 | 1.0 | Weight for RMS. |
Selection control
| Parameter | Range | Default | Description |
|---|---|---|---|
| Top_k_candidates | ≥1 | 5 | Number of best‑matching grains to consider (for stochastic selection). |
| Randomness | 0–1 | 0.5 | 0 = always pick the best; 1 = uniform random among top‑k. |
| Continuity_preference | ≥0 | 0.3 | Bonus subtracted from distance when the next grain in the same file follows the previous chosen grain. |
| Repetition_penalty | ≥0 | 1.5 | Penalty added to distance for grains used recently (memory ≈ 2 s). |
Silence gate (v1.2)
| Parameter | Range | Default | Description |
|---|---|---|---|
| Gate_threshold_dB | ≤ -10 dB | -40 dB | Grains in the target whose RMS is below this level (relative to peak) are replaced by silence and not rendered. This speeds up processing and prevents noisy matches for silent regions. |
Output
| Parameter | Default | Description |
|---|---|---|
| Normalize_output | yes | Scale output peak to 0.95 after rendering. |
| Draw_visualization | yes | Show waveforms, spectrograms, corpus usage strip, distance trace, and summary. |
| Play_result | yes | Auto‑play after processing. |
Silence gate (v1.2)
🎚️ How the gate works
For each target grain, the engine computes the peak RMS across all target grains.
The linear threshold is calculated as: gate_linear = peak_RMS × 10^(gate_db/20).
If the grain's RMS is below this threshold, it is not rendered – the output simply has silence at that position.
- This speeds up processing because no matching is performed for silent grains (they are skipped early).
- It also prevents the engine from trying to match very quiet (often noisy) regions with inappropriate corpus grains.
- The CSV path still contains a row for the grain, with
corpus_file = "__SILENCE__", distance 0.0. - The stats report includes a line
Silenced grains (Gated): X.
Recommended setting: -40 dB is a safe starting point. For very dynamic material, you may want a higher threshold (e.g., -30 dB) to silence more of the quiet parts.
Visualization (Praat picture)
When Draw_visualization = 1, the script draws a multi‑panel plot:
- Target and mosaic waveforms – for comparison.
- Target and mosaic spectrograms (0–5 kHz).
- Corpus usage strip – a horizontal bar for each target grain, colour‑coded by source file (up to 8 files, each with a distinct colour). Grey bars indicate silenced grains (
__SILENCE__). - Distance trace – the distance (match quality) for each grain (lower = better). Silenced grains show distance 0.
- Summary panel with:
- Target grains, silenced count, corpus pool size.
- Files analysed, unique files used, render time.
- All parameter settings (weights, randomness, continuity, penalty, gate).
FAQ / troubleshooting
Install: pip install numpy scipy soundfile librosa. On Windows, the script uses py.
Check the Info window for “Corpus grains available”. If 0, the engine could not extract features from any corpus file. Ensure files are readable and longer than one grain. Also check the Gate_threshold_dB – if it's too high, many target grains may be silenced. Reduce it (e.g., to -50 dB) to include more grains.
The overlap‑add synthesis uses a Hann window; if grains are too short or overlap too low, artefacts may appear. Increase Overlap_percent (e.g., to 75 %) for smoother transitions. Also ensure Grain_size_ms is not too small (≥50 ms recommended).
The penalty is applied to grains that have been used within the last ~2 seconds (calculated as penalty_memory = max(1, SR/hop * 2)).
This prevents the same grain from being reused too frequently, encouraging variety.
If the previous chosen grain came from a certain file, and the next grain in that same file (same file, next consecutive grain) is a candidate,
its distance is reduced by continuity_preference × max_dist_estimate. This encourages long runs from the same source file,
reducing “choppiness”.
The CSV written by the engine (temp_mosaic_path.csv) contains columns:
source_grain, source_time_sec, corpus_file, corpus_time_sec, distance.
Silenced grains have corpus_file = "__SILENCE__" and distance 0.