TinySOL Orchestration Retrieval — v1.3 User Guide
Orchestration retrieval engine for the TinySOL corpus (Filip, 2020). Select a target sound – the backend analyses its timbre, pitch, and harmonics, then finds the closest orchestral samples by weighted multi‑descriptor distance. Supports whole‑file and frame‑based analysis, family/instrument/MIDI constraints, and multi‑layer blending for richer textures.
What this does
TinySOL Orchestration Retrieval is a content‑based retrieval system for the TinySOL corpus (Filip, 2020), a dataset of over 1700 solo orchestral instrument samples. The engine analyses a target sound (any audio) and finds the most similar orchestral samples using a weighted combination of five descriptors:
- MFCC (20 coefficients) – timbral envelope
- Spectral envelope (24 log‑energy bands) – overall spectral shape
- Spectral moments (centroid, spread, skewness, kurtosis) – statistical shape
- Spectral peaks (16 strongest peaks) – harmonic structure
- Harmonic contribution – how well the corpus note’s harmonic series covers the target’s partials
- Whole file – single descriptor vector per target, best for timbral similarity.
- Frame‑based – tracks pitch and timbre over time, matches each frame individually, and reassembles via overlap‑add. Captures evolving textures (e.g., melodic lines, speech).
Quick start
- Download the TinySOL corpus (free from Zenodo) and the pre‑computed .db descriptor files (included with AudioTools).
- In Praat, select exactly one Sound object.
- Run script… →
TinySOL_Retrieval.praat. - Set the DB_directory (where the
TinySOL.*.dbfiles are) and Corpus_root (where theTinySOL/folder containing WAVs is). - Choose a Preset:
- REF‑whole (whole‑file reference), REF‑frame (frame‑based reference), REF‑orchids (Orchidea‑style), Speech (vocal input)
- Select Instrument_families (Brass, Strings, Winds, or combinations).
- Set Analysis_mode (Whole file or Frame‑based).
- Click OK. Python runs retrieval, blends the top matches, and imports the result as
originalname_orchestrated.
numpy, soundfile, scipy. The engine requires the TinySOL corpus and the companion .db files (provided separately). First run may take 10–20 seconds to load and index the corpus.
The 4 presets (+ Custom)
| Preset | Mode | Harmonic weight | Silence threshold | Description | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| REF‑whole | whole_file | 0.35 | 1.0 | 。||||||||||||
| REF‑frame | frame_based | 0.35 | 1.0 | 。||||||||||||
| REF‑orchids | whole_file | 0.35 | 1.0 | 。||||||||||||
| Speech | whole_file | 0.00 | 2.0 | 。
Each preset sets descriptor weights, silence threshold, and (for frame‑based) frame/hop/pitch tolerance accordingly.
The TinySOL corpus
🎻 Corpus structure
TinySOL (Filip, 2020) contains solo orchestral instrument samples across three families:
- Brass – trumpet, horn, trombone, bass tuba
- Strings – violin, viola, cello, contrabass
- Winds – flute, oboe, clarinet, bassoon
Samples are organised by instrument, technique (ord, pizz, etc.), pitch (MIDI note name), and dynamic (pp, p, mf, ff). The .db files contain pre‑computed descriptor vectors for every sample, enabling fast retrieval.
You can download TinySOL from Zenodo. Place the TinySOL/ folder (containing Brass/, Strings/, Winds/) somewhere on your disk. The .db files are included with AudioTools – point DB_directory to the folder containing them.
Descriptor set (5 dimensions)
| Descriptor | Dimension | Distance metric | What it captures | |||||||
|---|---|---|---|---|---|---|---|---|---|---|
| MFCC | 20 | Cosine | Spectral envelope shape (timbre). | |||||||
| Specenv | 24 | Cosine (log‑energy) | 。||||||||
| Moments | 4 | Log‑scaled Euclidean | 。||||||||
| Specpeaks | 16 | Cosine | 。||||||||
| Harmonic contribution | – | 。
Weights are user‑adjustable and must sum to ~1.0. The harmonic weight is applied only when the target has detectable pitch (voiced frames).
Parameters & defaults
Corpus paths
| Parameter | Default | Description |
|---|---|---|
| DB_directory | (empty) | Folder containing TinySOL.mfcc.db, TinySOL.specenv.db, etc. |
| Corpus_root | (empty) | Folder containing the TinySOL/ subfolder (Brass, Strings, Winds). |
Instrument constraints
Analysis mode
Retrieval & render
| Parameter | Options | Default | Description | |||||
|---|---|---|---|---|---|---|---|---|
| Number_of_results | 1–32 | 8 | 。||||||
| Render_mode | best / blend / top2 / top3 / top4 | best | 。||||||
| Render_gain | 0–2 | 0.8 | 。
Descriptor weights
Silence gate (v1.3)
If the best match score exceeds Silence_threshold, the output is silence (prevents rendering a bad match). Default 1.0 = disabled (always renders). For tuned systems, set to 0.85–0.95.
Analysis modes
Whole‑file mode
The target sound is analysed once, producing a single descriptor vector for each of the five descriptor types. This vector is compared against all corpus entries (after applying hard constraints). The result is a static timbral match – best for sustained sounds, instrument identification, or texture matching.
Frame‑based mode
The target is sliced into overlapping frames (Hann window, 75 % overlap). For each frame, the engine:
- Detects F0 (autocorrelation + octave check).
- Computes frame‑level descriptors (MFCC, specenv, moments, specpeaks).
- Retrieves the best‑matching corpus entry within pitch tolerance (± user‑defined semitones).
- Extracts a grain from the corpus sample’s sustain region, scales to target frame RMS, applies Hann window, and accumulates via overlap‑add.
This produces a resynthesis where the target’s timbre contour is followed by the corpus. Ideal for speech, melodies, or any signal with evolving timbre.
Visualization (Praat picture)
When Draw_visualization = 1, the script draws a 5‑panel figure:
- Input waveform (grey).
- Output waveform (green for mono, two coloured traces for stereo).
- Target spectrogram (0–5 kHz).
- Orchestrated spectrogram (0–5 kHz).
- Results panel – best match, family, instrument, note, dynamic, score, weights, and render settings.
FAQ / troubleshooting
This means the .db file paths do not match your corpus location. Check that DB_directory points to the folder containing the .db files, and Corpus_root points to the parent folder of the TinySOL/ directory. The .db files store absolute paths – if your folder structure differs, the engine falls back to filename‑stem matching, which may be incomplete. The diagnostic output shows sample paths from the corpus vs the .db file; adjust paths accordingly.
If Silence_threshold is set to a value lower than the best match score, the engine renders silence. This is intentional – it prevents bad matches from being returned. Lower the threshold (e.g., to 2.0) to always render, or tune your descriptor weights to achieve scores below your desired threshold.
The overlap‑add renderer applies Hann windowing and OLA normalisation. If you hear clicks, try increasing Frame_size_ms (e.g., to 200 ms) or adjusting Hop_size_ms. The default 75 ms hop (75 % overlap) is COLA‑compliant for a Hann window.
This is an Orchidea‑inspired metric that measures how well a corpus note’s harmonic series covers the target’s detected partials. It works without explicit F0 matching – a note an octave above the target still gets a moderate score (≈0.5), while a semitone‑off note scores near 1.0. This allows retrieval that respects harmonic content without strict pitch constraints.
When enabled (via preset or Speech_mode parameter), the harmonic weight is zeroed, MFCC and specenv weights are boosted, and the silence threshold is raised. This prevents the engine from trying to match speech with orchestral harmonic series (which would produce high scores and trigger the silence gate).