TinySOL Orchestration Retrieval — v1.3 User Guide

Orchestration retrieval engine for the TinySOL corpus (Filip, 2020). Select a target sound – the backend analyses its timbre, pitch, and harmonics, then finds the closest orchestral samples by weighted multi‑descriptor distance. Supports whole‑file and frame‑based analysis, family/instrument/MIDI constraints, and multi‑layer blending for richer textures.

Author: Shai Cohen Affiliation: Department of Music, Bar‑Ilan University, Israel Version: 1.3 (2025) – Domain‑first filtering, speech mode, silence gate License: MIT License Repo: GitHub

Contents:

What it does Quick start Presets (4 archetypes) The TinySOL corpus Descriptor set (5 dimensions) Parameters Analysis modes Visualization FAQ / troubleshooting

What this does

TinySOL Orchestration Retrieval is a content‑based retrieval system for the TinySOL corpus (Filip, 2020), a dataset of over 1700 solo orchestral instrument samples. The engine analyses a target sound (any audio) and finds the most similar orchestral samples using a weighted combination of five descriptors:

MFCC (20 coefficients) – timbral envelope
Spectral envelope (24 log‑energy bands) – overall spectral shape
Spectral moments (centroid, spread, skewness, kurtosis) – statistical shape
Spectral peaks (16 strongest peaks) – harmonic structure
Harmonic contribution – how well the corpus note’s harmonic series covers the target’s partials

Two analysis modes:

Whole file – single descriptor vector per target, best for timbral similarity.
Frame‑based – tracks pitch and timbre over time, matches each frame individually, and reassembles via overlap‑add. Captures evolving textures (e.g., melodic lines, speech).

Quick start

Download the TinySOL corpus (free from Zenodo) and the pre‑computed .db descriptor files (included with AudioTools).
In Praat, select exactly one Sound object.
Run script… → TinySOL_Retrieval.praat.
Set the DB_directory (where the TinySOL.*.db files are) and Corpus_root (where the TinySOL/ folder containing WAVs is).
Choose a Preset:
- REF‑whole (whole‑file reference), REF‑frame (frame‑based reference), REF‑orchids (Orchidea‑style), Speech (vocal input)
Select Instrument_families (Brass, Strings, Winds, or combinations).
Set Analysis_mode (Whole file or Frame‑based).
Click OK. Python runs retrieval, blends the top matches, and imports the result as originalname_orchestrated.

Tip: Start with REF‑whole and MFCC weight 0.25, harmonic 0.35 for balanced timbral + pitch matching. For speech input, use the Speech preset (harmonic weight = 0). For melodic lines, try REF‑frame.

Important: Python dependencies: numpy, soundfile, scipy. The engine requires the TinySOL corpus and the companion .db files (provided separately). First run may take 10–20 seconds to load and index the corpus.

The 4 presets (+ Custom)

。。。。

Preset	Mode	Harmonic weight	Silence threshold
REF‑whole	whole_file	0.35	1.0
REF‑frame	frame_based	0.35	1.0
REF‑orchids	whole_file	0.35	1.0
Speech	whole_file	0.00	2.0

Each preset sets descriptor weights, silence threshold, and (for frame‑based) frame/hop/pitch tolerance accordingly.

The TinySOL corpus

🎻 Corpus structure

TinySOL (Filip, 2020) contains solo orchestral instrument samples across three families:

Brass – trumpet, horn, trombone, bass tuba
Strings – violin, viola, cello, contrabass
Winds – flute, oboe, clarinet, bassoon

Samples are organised by instrument, technique (ord, pizz, etc.), pitch (MIDI note name), and dynamic (pp, p, mf, ff). The .db files contain pre‑computed descriptor vectors for every sample, enabling fast retrieval.

You can download TinySOL from Zenodo. Place the TinySOL/ folder (containing Brass/, Strings/, Winds/) somewhere on your disk. The .db files are included with AudioTools – point DB_directory to the folder containing them.

Descriptor set (5 dimensions)

。。。。

Descriptor	Dimension	Distance metric	What it captures
MFCC	20	Cosine	Spectral envelope shape (timbre).
Specenv	24	Cosine (log‑energy)
Moments	4	Log‑scaled Euclidean
Specpeaks	16	Cosine
Harmonic contribution	–

Weights are user‑adjustable and must sum to ~1.0. The harmonic weight is applied only when the target has detectable pitch (voiced frames).

Parameters & defaults

Corpus paths

Parameter	Default	Description
DB_directory	(empty)	Folder containing `TinySOL.mfcc.db`, `TinySOL.specenv.db`, etc.
Corpus_root	(empty)	Folder containing the `TinySOL/` subfolder (Brass, Strings, Winds).

Instrument constraints

ParameterOptionsDefaultDescription Instrument_familiesAll families / Brass only / Strings only / Winds only / combinationsAll families。 Specific_instrumentscomma‑separated list(empty)。Vn, Fl, TpC). Min/Max MIDI pitch0–12736–96。

Analysis mode

ParameterOptionsDefaultDescription Analysis_modeWhole file / Frame‑basedWhole file。 Frame_size_ms / Hop_size_ms50–500 ms / 10–frame_size150 ms / 75 ms。 Pitch_tolerance_semitones0–122。 Pitch_pan_in_stereoyes/no0。

Retrieval & render

。。。

Parameter	Options	Default
Number_of_results	1–32	8
Render_mode	best / blend / top2 / top3 / top4	best
Render_gain	0–2	0.8

Descriptor weights

ParameterRangeDefaultDescription MFCC_weight / Specenv_weight / Moments_weight / Specpeaks_weight / Harmonic_weight≥00.25 / 0.20 / 0.05 / 0.15 / 0.35。

Silence gate (v1.3)

If the best match score exceeds Silence_threshold, the output is silence (prevents rendering a bad match). Default 1.0 = disabled (always renders). For tuned systems, set to 0.85–0.95.

Analysis modes

Whole‑file mode

The target sound is analysed once, producing a single descriptor vector for each of the five descriptor types. This vector is compared against all corpus entries (after applying hard constraints). The result is a static timbral match – best for sustained sounds, instrument identification, or texture matching.

Frame‑based mode

The target is sliced into overlapping frames (Hann window, 75 % overlap). For each frame, the engine:

Detects F0 (autocorrelation + octave check).
Computes frame‑level descriptors (MFCC, specenv, moments, specpeaks).
Retrieves the best‑matching corpus entry within pitch tolerance (± user‑defined semitones).
Extracts a grain from the corpus sample’s sustain region, scales to target frame RMS, applies Hann window, and accumulates via overlap‑add.

This produces a resynthesis where the target’s timbre contour is followed by the corpus. Ideal for speech, melodies, or any signal with evolving timbre.

Visualization (Praat picture)

When Draw_visualization = 1, the script draws a 5‑panel figure:

Input waveform (grey).
Output waveform (green for mono, two coloured traces for stereo).
Target spectrogram (0–5 kHz).
Orchestrated spectrogram (0–5 kHz).
Results panel – best match, family, instrument, note, dynamic, score, weights, and render settings.

Tip: The output spectrogram often shows the orchestral instrument’s harmonic structure overlaid on the target’s temporal envelope. For frame‑based mode, the results panel shows frame counts and matched frames.

FAQ / troubleshooting

“Descriptor coverage near 0%”

This means the .db file paths do not match your corpus location. Check that DB_directory points to the folder containing the .db files, and Corpus_root points to the parent folder of the TinySOL/ directory. The .db files store absolute paths – if your folder structure differs, the engine falls back to filename‑stem matching, which may be incomplete. The diagnostic output shows sample paths from the corpus vs the .db file; adjust paths accordingly.

Output is silent / silence_rendered=1

If Silence_threshold is set to a value lower than the best match score, the engine renders silence. This is intentional – it prevents bad matches from being returned. Lower the threshold (e.g., to 2.0) to always render, or tune your descriptor weights to achieve scores below your desired threshold.

Frame‑based output has clicks / amplitude bumps

The overlap‑add renderer applies Hann windowing and OLA normalisation. If you hear clicks, try increasing Frame_size_ms (e.g., to 200 ms) or adjusting Hop_size_ms. The default 75 ms hop (75 % overlap) is COLA‑compliant for a Hann window.

Harmonic contribution scoring

This is an Orchidea‑inspired metric that measures how well a corpus note’s harmonic series covers the target’s detected partials. It works without explicit F0 matching – a note an octave above the target still gets a moderate score (≈0.5), while a semitone‑off note scores near 1.0. This allows retrieval that respects harmonic content without strict pitch constraints.

Speech mode

When enabled (via preset or Speech_mode parameter), the harmonic weight is zeroed, MFCC and specenv weights are boosted, and the silence threshold is raised. This prevents the engine from trying to match speech with orchestral harmonic series (which would produce high scores and trigger the silence gate).