TinySOL Orchestration Retrieval — v1.3 User Guide

Orchestration retrieval engine for the TinySOL corpus (Filip, 2020). Select a target sound – the backend analyses its timbre, pitch, and harmonics, then finds the closest orchestral samples by weighted multi‑descriptor distance. Supports whole‑file and frame‑based analysis, family/instrument/MIDI constraints, and multi‑layer blending for richer textures.

Author: Shai Cohen Affiliation: Department of Music, Bar‑Ilan University, Israel Version: 1.3 (2025) – Domain‑first filtering, speech mode, silence gate License: MIT License Repo: GitHub
Contents:

What this does

TinySOL Orchestration Retrieval is a content‑based retrieval system for the TinySOL corpus (Filip, 2020), a dataset of over 1700 solo orchestral instrument samples. The engine analyses a target sound (any audio) and finds the most similar orchestral samples using a weighted combination of five descriptors:

Two analysis modes:
  • Whole file – single descriptor vector per target, best for timbral similarity.
  • Frame‑based – tracks pitch and timbre over time, matches each frame individually, and reassembles via overlap‑add. Captures evolving textures (e.g., melodic lines, speech).

Quick start

  1. Download the TinySOL corpus (free from Zenodo) and the pre‑computed .db descriptor files (included with AudioTools).
  2. In Praat, select exactly one Sound object.
  3. Run script…TinySOL_Retrieval.praat.
  4. Set the DB_directory (where the TinySOL.*.db files are) and Corpus_root (where the TinySOL/ folder containing WAVs is).
  5. Choose a Preset:
    • REF‑whole (whole‑file reference), REF‑frame (frame‑based reference), REF‑orchids (Orchidea‑style), Speech (vocal input)
  6. Select Instrument_families (Brass, Strings, Winds, or combinations).
  7. Set Analysis_mode (Whole file or Frame‑based).
  8. Click OK. Python runs retrieval, blends the top matches, and imports the result as originalname_orchestrated.
Tip: Start with REF‑whole and MFCC weight 0.25, harmonic 0.35 for balanced timbral + pitch matching. For speech input, use the Speech preset (harmonic weight = 0). For melodic lines, try REF‑frame.
Important: Python dependencies: numpy, soundfile, scipy. The engine requires the TinySOL corpus and the companion .db files (provided separately). First run may take 10–20 seconds to load and index the corpus.

The 4 presets (+ Custom)

PresetModeHarmonic weightSilence thresholdDescription
REF‑wholewhole_file0.351.0
REF‑frameframe_based0.351.0
REF‑orchidswhole_file0.351.0
Speechwhole_file0.002.0

Each preset sets descriptor weights, silence threshold, and (for frame‑based) frame/hop/pitch tolerance accordingly.

The TinySOL corpus

🎻 Corpus structure

TinySOL (Filip, 2020) contains solo orchestral instrument samples across three families:

  • Brass – trumpet, horn, trombone, bass tuba
  • Strings – violin, viola, cello, contrabass
  • Winds – flute, oboe, clarinet, bassoon

Samples are organised by instrument, technique (ord, pizz, etc.), pitch (MIDI note name), and dynamic (pp, p, mf, ff). The .db files contain pre‑computed descriptor vectors for every sample, enabling fast retrieval.

You can download TinySOL from Zenodo. Place the TinySOL/ folder (containing Brass/, Strings/, Winds/) somewhere on your disk. The .db files are included with AudioTools – point DB_directory to the folder containing them.

Descriptor set (5 dimensions)

DescriptorDimensionDistance metricWhat it captures
MFCC20CosineSpectral envelope shape (timbre).
Specenv24Cosine (log‑energy)
Moments4Log‑scaled Euclidean
Specpeaks16Cosine
Harmonic contribution

Weights are user‑adjustable and must sum to ~1.0. The harmonic weight is applied only when the target has detectable pitch (voiced frames).

Parameters & defaults

Corpus paths

ParameterDefaultDescription
DB_directory(empty)Folder containing TinySOL.mfcc.db, TinySOL.specenv.db, etc.
Corpus_root(empty)Folder containing the TinySOL/ subfolder (Brass, Strings, Winds).

Instrument constraints

ParameterOptionsDefaultDescription Instrument_familiesAll families / Brass only / Strings only / Winds only / combinationsAll families。 Specific_instrumentscomma‑separated list(empty)。Vn, Fl, TpC). Min/Max MIDI pitch0–12736–96。

Analysis mode

ParameterOptionsDefaultDescription Analysis_modeWhole file / Frame‑basedWhole file。 Frame_size_ms / Hop_size_ms50–500 ms / 10–frame_size150 ms / 75 ms。 Pitch_tolerance_semitones0–122。 Pitch_pan_in_stereoyes/no0。

Retrieval & render

ParameterOptionsDefaultDescription
Number_of_results1–328
Render_modebest / blend / top2 / top3 / top4best
Render_gain0–20.8

Descriptor weights

ParameterRangeDefaultDescription MFCC_weight / Specenv_weight / Moments_weight / Specpeaks_weight / Harmonic_weight≥00.25 / 0.20 / 0.05 / 0.15 / 0.35。

Silence gate (v1.3)

If the best match score exceeds Silence_threshold, the output is silence (prevents rendering a bad match). Default 1.0 = disabled (always renders). For tuned systems, set to 0.85–0.95.

Analysis modes

Whole‑file mode

The target sound is analysed once, producing a single descriptor vector for each of the five descriptor types. This vector is compared against all corpus entries (after applying hard constraints). The result is a static timbral match – best for sustained sounds, instrument identification, or texture matching.

Frame‑based mode

The target is sliced into overlapping frames (Hann window, 75 % overlap). For each frame, the engine:

  1. Detects F0 (autocorrelation + octave check).
  2. Computes frame‑level descriptors (MFCC, specenv, moments, specpeaks).
  3. Retrieves the best‑matching corpus entry within pitch tolerance (± user‑defined semitones).
  4. Extracts a grain from the corpus sample’s sustain region, scales to target frame RMS, applies Hann window, and accumulates via overlap‑add.

This produces a resynthesis where the target’s timbre contour is followed by the corpus. Ideal for speech, melodies, or any signal with evolving timbre.

Visualization (Praat picture)

When Draw_visualization = 1, the script draws a 5‑panel figure:

Tip: The output spectrogram often shows the orchestral instrument’s harmonic structure overlaid on the target’s temporal envelope. For frame‑based mode, the results panel shows frame counts and matched frames.

FAQ / troubleshooting

“Descriptor coverage near 0%”

This means the .db file paths do not match your corpus location. Check that DB_directory points to the folder containing the .db files, and Corpus_root points to the parent folder of the TinySOL/ directory. The .db files store absolute paths – if your folder structure differs, the engine falls back to filename‑stem matching, which may be incomplete. The diagnostic output shows sample paths from the corpus vs the .db file; adjust paths accordingly.

Output is silent / silence_rendered=1

If Silence_threshold is set to a value lower than the best match score, the engine renders silence. This is intentional – it prevents bad matches from being returned. Lower the threshold (e.g., to 2.0) to always render, or tune your descriptor weights to achieve scores below your desired threshold.

Frame‑based output has clicks / amplitude bumps

The overlap‑add renderer applies Hann windowing and OLA normalisation. If you hear clicks, try increasing Frame_size_ms (e.g., to 200 ms) or adjusting Hop_size_ms. The default 75 ms hop (75 % overlap) is COLA‑compliant for a Hann window.

Harmonic contribution scoring

This is an Orchidea‑inspired metric that measures how well a corpus note’s harmonic series covers the target’s detected partials. It works without explicit F0 matching – a note an octave above the target still gets a moderate score (≈0.5), while a semitone‑off note scores near 1.0. This allows retrieval that respects harmonic content without strict pitch constraints.

Speech mode

When enabled (via preset or Speech_mode parameter), the harmonic weight is zeroed, MFCC and specenv weights are boosted, and the silence threshold is raised. This prevents the engine from trying to match speech with orchestral harmonic series (which would produce high scores and trigger the silence gate).