Corpus Mosaic — v1.2 User Guide

Offline corpus‑based mosaic synthesis. Reconstructs a target Sound object by splicing together tiny grains of audio from a folder of corpus sounds. Matching is performed via a Python engine using a 6‑dimensional feature vector (Loudness, Centroid, Flatness, Rolloff, ZCR, Pitch). Includes repetition penalties, continuity bonuses, and a new silence gate to skip low‑energy target grains.

Author: Shai Cohen Affiliation: Department of Music, Bar‑Ilan University, Israel Version: 1.2 (2026) – Silence Gate License: MIT License Repo: GitHub

Contents:

What it does Quick start Pipeline (5 stages) The 6‑dim feature vector Parameters Silence gate (v1.2) Visualization FAQ / troubleshooting

What this does

Corpus Mosaic is an offline concatenative synthesizer. It takes a target sound and a folder of corpus sounds, slices both into grains of equal duration (with overlap), extracts a 6‑dimensional feature vector for each grain, and then rebuilds the target by selecting the best‑matching corpus grain for every target grain. The selection process can be influenced by repetition penalties (to avoid reusing the same grain too often), continuity bonuses (to favour consecutive grains from the same source file), and a randomness factor.

v1.2 adds a silence gate: target grains whose RMS energy falls below a threshold (in dB relative to the peak RMS) are replaced by silence, not rendered. This saves processing time and prevents noisy matches for silent regions.

Quick start

In Praat, select exactly one Sound object (the target).
Run script… → CorpusMosaic.praat.
In the directory chooser, select a folder containing corpus sounds (WAV, FLAC, AIFF). The script will scan recursively.
In the form, set:
- Grain_size_ms – duration of each grain (e.g., 100 ms).
- Overlap_percent – overlap between consecutive grains (0–99 %).
- Pitch_weight, Timbre_weight, Loudness_weight – relative importance of each feature group.
- Top_k_candidates – number of best‑matching grains to consider.
- Randomness – 0 = always choose the best match; 1 = uniform random among top‑k.
- Continuity_preference – bonus for choosing the next grain from the same source file.
- Repetition_penalty – penalty for reusing a grain recently.
- Gate_threshold_dB – silence gate level (e.g., ‑40 dB). Grains below this are not rendered.
Click OK. Python scans the corpus, extracts features, computes distances, and renders the mosaic as originalname_mosaic.

Tip: Start with weights all 1.0, top‑k=5, randomness=0.5, continuity=0.3, penalty=1.5, gate=‑40 dB. Adjust weights to emphasise pitch (for melodic targets) or timbre (for texture).

Important: Python dependencies: numpy, scipy, soundfile, librosa. The engine uses librosa for feature extraction (YIN pitch, spectral features). For large corpora, feature extraction may take a minute or two.

Pipeline — five stages

Stage 1 – Source processing (Python) – load target, slice into overlapping grains, extract 6‑dim features.
Stage 2 – Corpus scanning – recursively scan corpus folder, load each file, slice into grains, extract features, store metadata (file path, start sample).
Stage 3 – Normalisation & weighting – robust z‑score normalisation across all corpus grains. Apply user‑supplied weights to the squared‑distance calculation.
Stage 4 – Matching & rendering – for each target grain, compute distances to all corpus grains, apply penalties/bonuses, select grain (deterministic or stochastic), place into output buffer with Hann window.
Stage 5 – Silence gate – if target grain RMS < threshold, skip rendering (inserts a placeholder in CSV but no audio). This speeds up processing and prevents noisy matches for silent regions.

The 6‑dimensional feature vector

Feature	Description	Associated weight
RMS	Root‑mean‑square energy (loudness)	Loudness_weight
Centroid	Spectral centroid (brightness)	Timbre_weight
Flatness	Spectral flatness (tonal vs. noisy)	Timbre_weight
Rolloff	Frequency below which 85 % of energy lies	Timbre_weight
ZCR	Zero‑crossing rate (noisiness)	Timbre_weight
Pitch (F0)	Fundamental frequency from YIN algorithm (0 = unvoiced)	Pitch_weight

Features are normalised using robust z‑score (mean and standard deviation) across the entire corpus. The weighted squared Euclidean distance is then computed between each target grain and all corpus grains.

Parameters & defaults

Grain & overlap

Parameter	Range	Default	Description
Grain_size_ms	≥10 ms	100 ms	Duration of each analysis/ synthesis grain.
Overlap_percent	0–99 %	50 %	Overlap between consecutive grains (hop = grain × (1‑overlap)).

Feature weights

Parameter	Range	Default	Description
Pitch_weight	≥0	1.0	Weight for F0 feature.
Timbre_weight	≥0	1.0	Weight for centroid, flatness, rolloff, ZCR.
Loudness_weight	≥0	1.0	Weight for RMS.

Selection control

Parameter	Range	Default	Description
Top_k_candidates	≥1	5	Number of best‑matching grains to consider (for stochastic selection).
Randomness	0–1	0.5	0 = always pick the best; 1 = uniform random among top‑k.
Continuity_preference	≥0	0.3	Bonus subtracted from distance when the next grain in the same file follows the previous chosen grain.
Repetition_penalty	≥0	1.5	Penalty added to distance for grains used recently (memory ≈ 2 s).

Silence gate (v1.2)

Parameter	Range	Default	Description
Gate_threshold_dB	≤ -10 dB	-40 dB	Grains in the target whose RMS is below this level (relative to peak) are replaced by silence and not rendered. This speeds up processing and prevents noisy matches for silent regions.

Output

Parameter	Default	Description
Normalize_output	yes	Scale output peak to 0.95 after rendering.
Draw_visualization	yes	Show waveforms, spectrograms, corpus usage strip, distance trace, and summary.
Play_result	yes	Auto‑play after processing.

Silence gate (v1.2)

🎚️ How the gate works

For each target grain, the engine computes the peak RMS across all target grains. The linear threshold is calculated as: gate_linear = peak_RMS × 10^(gate_db/20). If the grain's RMS is below this threshold, it is not rendered – the output simply has silence at that position.

This speeds up processing because no matching is performed for silent grains (they are skipped early).
It also prevents the engine from trying to match very quiet (often noisy) regions with inappropriate corpus grains.
The CSV path still contains a row for the grain, with corpus_file = "__SILENCE__", distance 0.0.
The stats report includes a line Silenced grains (Gated): X.

Recommended setting: -40 dB is a safe starting point. For very dynamic material, you may want a higher threshold (e.g., -30 dB) to silence more of the quiet parts.

Visualization (Praat picture)

When Draw_visualization = 1, the script draws a multi‑panel plot:

Target and mosaic waveforms – for comparison.
Target and mosaic spectrograms (0–5 kHz).
Corpus usage strip – a horizontal bar for each target grain, colour‑coded by source file (up to 8 files, each with a distinct colour). Grey bars indicate silenced grains (__SILENCE__).
Distance trace – the distance (match quality) for each grain (lower = better). Silenced grains show distance 0.
Summary panel with:
- Target grains, silenced count, corpus pool size.
- Files analysed, unique files used, render time.
- All parameter settings (weights, randomness, continuity, penalty, gate).

Tip: The corpus usage strip gives an immediate overview of which source files contributed most. The distance trace shows how well the mosaic matches the target – spikes indicate poor matches (high distance).

FAQ / troubleshooting

“Python not found” or missing packages

Install: pip install numpy scipy soundfile librosa. On Windows, the script uses py.

No corpus grains extracted / output silent

Check the Info window for “Corpus grains available”. If 0, the engine could not extract features from any corpus file. Ensure files are readable and longer than one grain. Also check the Gate_threshold_dB – if it's too high, many target grains may be silenced. Reduce it (e.g., to -50 dB) to include more grains.

Mosaic sounds choppy / has artefacts

The overlap‑add synthesis uses a Hann window; if grains are too short or overlap too low, artefacts may appear. Increase Overlap_percent (e.g., to 75 %) for smoother transitions. Also ensure Grain_size_ms is not too small (≥50 ms recommended).

Repetition penalty memory

The penalty is applied to grains that have been used within the last ~2 seconds (calculated as penalty_memory = max(1, SR/hop * 2)). This prevents the same grain from being reused too frequently, encouraging variety.

Continuity bonus

If the previous chosen grain came from a certain file, and the next grain in that same file (same file, next consecutive grain) is a candidate, its distance is reduced by continuity_preference × max_dist_estimate. This encourages long runs from the same source file, reducing “choppiness”.

CSV format

The CSV written by the engine (temp_mosaic_path.csv) contains columns: source_grain, source_time_sec, corpus_file, corpus_time_sec, distance. Silenced grains have corpus_file = "__SILENCE__" and distance 0.