Formant Swarm Granulator — v1.1 User Guide

Resonance‑organised granular cloud engine. Segments a sound into grains, extracts formant profiles, and generates a swarm where grain placement is guided by formant similarity, temporal repulsion, and local density. Four swarm modes shape the migration of grains through the resonance space.

Author: Shai Cohen Affiliation: Department of Music, Bar‑Ilan University, Israel Version: 1.1 (2026) License: MIT License Repo: GitHub

Contents:

What it does Quick start Swarm modes Pipeline (5 stages) Parameters Visualization FAQ / troubleshooting

What this does

Formant Swarm Granulator treats each short grain (30–250 ms) as a particle carrying a resonance fingerprint (F1, F2, F3, bandwidths, pitch, intensity). A Python engine projects these particles into a 2‑D similarity field and walks a path through the swarm, selecting grains according to attraction (formant similarity) and two types of repulsion (temporal – avoid grains from the same source region; density – avoid overusing any grain).

Why “swarm”? Grains are not randomly reordered. They move through a resonance space where similar vowels / timbres cluster together. The walk is governed by a potential field: you can control how strongly the swarm stays in one formant neighbourhood (attraction), how much it tries to move to distant source times (temporal repulsion), and how much it avoids grains already used many times (density repulsion). The result is an evolving cloud organised by hidden vowel anatomy, not raw randomness.

Quick start

In Praat, select exactly one Sound object (mono or stereo).
Run script… → FormantSwarmGranulator.praat.
Choose a Swarm_mode:
- vowel_cloud – balanced, forms clusters by vowel quality.
- resonance_turbulence – chaotic, high stochasticity.
- migration – drifts between cluster centers.
- counterpoint – alternates voiced/unvoiced grains.
Set Grain_length_ms (e.g. 55 ms), Grain_jitter_ms (random variation), and Grain_overlap_percent (0% = sequential, 50% = heavy overlap).
Adjust Density_grains_per_sec (overall event density).
Tune Attraction, Temporal_repulsion, Density_repulsion – the three swarm forces.
Optionally set Pan_spread (stereo width) and Pitch_drift_semitones (random pitch variation per grain).
Click OK. Praat analyses each grain (pitch, intensity, formants, bandwidths), exports CSV, and calls the Python engine. The result is imported as originalname_formantSwarm.

Quick tip: Start with vowel_cloud, Attraction=1.0, Temporal=0.9, Density=0.7. For a slowly evolving texture, increase Temporal_repulsion (forces the swarm to explore new source regions). For a static wash, set Temporal_repulsion low and Attraction high.

Important: Python dependencies: numpy, soundfile. Grain analysis uses Praat’s built‑in pitch, intensity, spectrum, and formant objects. Very short grains (<30 ms) may produce unreliable formants – the script enforces a minimum grain length. The output is stereo, with per‑grain panning derived from the 2‑D swarm projection.

The four swarm modes

Mode	Score formula (conceptual)	Behaviour
vowel_cloud	`score = attraction × similarity + 0.22×cluster_bonus + 0.10×crowd_bonus`	Forms tight clusters around similar formant centres; balanced, the default.
resonance_turbulence	`score = 0.75×attraction×similarity + 0.45×random + 0.12×crowd`	High stochasticity, chaotic migrations, unpredictable grain sequences.
migration	`score = 0.55×attraction×similarity + 0.30×exp(-cluster_distance) + 0.10×crowd`	Drifts slowly between cluster centers; smooth timbral evolution.
counterpoint	`score = 0.70×attraction×similarity + 0.20×voiced_match + 0.10×crowd`	Alternates voiced/unvoiced grains; creates rhythmic separation.

Forces: similarity = exponential of negative distance in formant space. cluster_bonus = grains from same resonance family. crowd_bonus = 1/(1+usage) – discourages overusing any grain. temporal_penalty = exponential of time difference – favours grains from far‑away source moments.

Pipeline — five stages

Stage 1 – Grain extraction (Praat) – sliding window with overlap, jitter; extract pitch, intensity, spectral centroid, formants (F1‑F3) and bandwidths. Write CSV.
Stage 2 – Build resonance feature space (Python) – robust‑normalised feature matrix (f1,f2,f3,bandwidth composite, intensity, centroid, voiced flag).
Stage 3 – Clustering & projection – k‑means (k=6) into resonance families; principal component projection to 2‑D (x,y for each grain).
Stage 4 – Swarm walk – iterative selection: choose next grain based on score combining attraction, repulsion, and mode‑specific terms. Produces schedule of events.
Stage 5 – Render & balance – extract grains, resample (pitch drift), fade, pan according to grain.x, overlay, per‑channel RMS balance, final gain match to source RMS.

Parameters & defaults

Grain generation

Parameter	Default	Description
Grain_length_ms	55	Target grain duration (ms). Jitter adds random variation.
Grain_jitter_ms	15	Random ± variation applied to each grain’s length.
Grain_overlap_percent	0	Overlap between consecutive grains (0–90%). Hop = length × (1‑overlap).
Density_grains_per_sec	18	Number of scheduled output events per second (not input grains).

Swarm forces

Parameter	Default	Description
Attraction	1.0	Weight for formant similarity (0 = ignore, 2 = strongly attracted).
Temporal_repulsion	0.9	Penalty for selecting grains close in source time; higher = explore different phrase regions.
Density_repulsion	0.7	Penalty for over‑used grains; higher = more even usage.

Spatial / pitch

Parameter	Default	Description
Pan_spread	1.0	Stereo width multiplier (0 = centre, 1 = full, >1 = hyper‑wide).
Pitch_drift_semitones	0.5	Random pitch variation per grain (±, via varispeed resampling).

Formant analysis

Parameter	Default	Description
Max_formant_hz	5500	Maximum formant frequency (Praat setting).
Number_of_formants	5	Number of formants to track (only F1‑F3 are used).

Visualization (Praat picture)

When Draw_visualization = 1, the script draws a multi‑panel plot:

Original and output waveforms – for comparison.
Original and output spectrograms (0–5 kHz).
Formant bar panel – horizontal bars showing mean F1 (red), F2 (blue), F3 (teal) across all analysed grains. Bar length = formant frequency in Hz.
Mode colour strip – coloured bar with the selected mode name.
Summary panel with:
- Grains analysed / scheduled events / number of resonance clusters.
- Voiced ratio, mean F1/F2/F3.
- All parameter settings (attraction, repulsions, pan, pitch drift, seed).
- RMS in / RMS out – note that output RMS is channel‑balanced and matched to source.

Tip: The formant bar panel gives a quick profile of the vowel space in the original sound. The output will re‑organise grains according to this space, so the formant bars are the “colour palette”.

FAQ / troubleshooting

“Python not found” or missing packages

Install: pip install numpy soundfile. On Windows, the script tries py, python, python3.

Grains analysed = 0 or very few

The sound may be too short (need at least one full grain). Reduce Grain_length_ms or increase overlap to generate more grains.

Output is quiet / RMS mismatch

The engine applies per‑channel RMS balance (to avoid one channel being louder) then scales the combined output to match the source RMS. If the source has very low‑level passages, the active‑region RMS may be higher – but the script uses the whole‑file RMS, which is safe. If the output sounds too quiet, you can increase gain manually after import.

Pitch_drift_semitones has no audible effect

Pitch drift is implemented as varispeed resampling (duration changes). Because grains are very short, a 0.5 st shift is subtle. Increase to 2–3 st for noticeable variation. Very large shifts may cause unnatural transients – use with care.

Pan_spread > 1.0 produces weird stereo

Pan is derived from the grain’s x‑coordinate in the 2‑D swarm projection. pan = tanh(0.9*x + noise). Multiplying by pan_spread can push some grains beyond ±1, which is still legal – it just makes them louder in one channel. Use values 0–1 for normal width, >1 for exaggerated separation.

Cluster count (k=6)

The k‑means clustering uses a fixed k=6 (maximum distinct resonance families). This is a design choice – six formant‑based classes are enough to capture most vowel diversity without over‑fragmenting.

Command‑line usage of formant_swarm_granulator.py

The Python engine can be run independently (batch processing):

python formant_swarm_granulator.py --grains grains.csv --input in.wav --output out.wav --stats stats.txt [options]

Options: --mode, --density, --attraction, --temporal_repulsion, --density_repulsion, --pan_spread, --pitch_drift, --seed.