Formant Swarm Granulator — v1.1 User Guide
Resonance‑organised granular cloud engine. Segments a sound into grains, extracts formant profiles, and generates a swarm where grain placement is guided by formant similarity, temporal repulsion, and local density. Four swarm modes shape the migration of grains through the resonance space.
What this does
Formant Swarm Granulator treats each short grain (30–250 ms) as a particle carrying a resonance fingerprint (F1, F2, F3, bandwidths, pitch, intensity). A Python engine projects these particles into a 2‑D similarity field and walks a path through the swarm, selecting grains according to attraction (formant similarity) and two types of repulsion (temporal – avoid grains from the same source region; density – avoid overusing any grain).
Quick start
- In Praat, select exactly one Sound object (mono or stereo).
- Run script… →
FormantSwarmGranulator.praat. - Choose a Swarm_mode:
vowel_cloud– balanced, forms clusters by vowel quality.resonance_turbulence– chaotic, high stochasticity.migration– drifts between cluster centers.counterpoint– alternates voiced/unvoiced grains.
- Set Grain_length_ms (e.g. 55 ms), Grain_jitter_ms (random variation), and Grain_overlap_percent (0% = sequential, 50% = heavy overlap).
- Adjust Density_grains_per_sec (overall event density).
- Tune Attraction, Temporal_repulsion, Density_repulsion – the three swarm forces.
- Optionally set Pan_spread (stereo width) and Pitch_drift_semitones (random pitch variation per grain).
- Click OK. Praat analyses each grain (pitch, intensity, formants, bandwidths), exports CSV, and calls the Python engine. The result is imported as
originalname_formantSwarm.
numpy, soundfile.
Grain analysis uses Praat’s built‑in pitch, intensity, spectrum, and formant objects.
Very short grains (<30 ms) may produce unreliable formants – the script enforces a minimum grain length.
The output is stereo, with per‑grain panning derived from the 2‑D swarm projection.
The four swarm modes
| Mode | Score formula (conceptual) | Behaviour |
|---|---|---|
| vowel_cloud | score = attraction × similarity + 0.22×cluster_bonus + 0.10×crowd_bonus |
Forms tight clusters around similar formant centres; balanced, the default. |
| resonance_turbulence | score = 0.75×attraction×similarity + 0.45×random + 0.12×crowd |
High stochasticity, chaotic migrations, unpredictable grain sequences. |
| migration | score = 0.55×attraction×similarity + 0.30×exp(-cluster_distance) + 0.10×crowd |
Drifts slowly between cluster centers; smooth timbral evolution. |
| counterpoint | score = 0.70×attraction×similarity + 0.20×voiced_match + 0.10×crowd |
Alternates voiced/unvoiced grains; creates rhythmic separation. |
Forces: similarity = exponential of negative distance in formant space.
cluster_bonus = grains from same resonance family.
crowd_bonus = 1/(1+usage) – discourages overusing any grain.
temporal_penalty = exponential of time difference – favours grains from far‑away source moments.
Pipeline — five stages
Stage 2 – Build resonance feature space (Python) – robust‑normalised feature matrix (f1,f2,f3,bandwidth composite, intensity, centroid, voiced flag).
Stage 3 – Clustering & projection – k‑means (k=6) into resonance families; principal component projection to 2‑D (x,y for each grain).
Stage 4 – Swarm walk – iterative selection: choose next grain based on score combining attraction, repulsion, and mode‑specific terms. Produces schedule of events.
Stage 5 – Render & balance – extract grains, resample (pitch drift), fade, pan according to grain.x, overlay, per‑channel RMS balance, final gain match to source RMS.
Parameters & defaults
Grain generation
| Parameter | Default | Description |
|---|---|---|
| Grain_length_ms | 55 | Target grain duration (ms). Jitter adds random variation. |
| Grain_jitter_ms | 15 | Random ± variation applied to each grain’s length. |
| Grain_overlap_percent | 0 | Overlap between consecutive grains (0–90%). Hop = length × (1‑overlap). |
| Density_grains_per_sec | 18 | Number of scheduled output events per second (not input grains). |
Swarm forces
| Parameter | Default | Description |
|---|---|---|
| Attraction | 1.0 | Weight for formant similarity (0 = ignore, 2 = strongly attracted). |
| Temporal_repulsion | 0.9 | Penalty for selecting grains close in source time; higher = explore different phrase regions. |
| Density_repulsion | 0.7 | Penalty for over‑used grains; higher = more even usage. |
Spatial / pitch
| Parameter | Default | Description |
|---|---|---|
| Pan_spread | 1.0 | Stereo width multiplier (0 = centre, 1 = full, >1 = hyper‑wide). |
| Pitch_drift_semitones | 0.5 | Random pitch variation per grain (±, via varispeed resampling). |
Formant analysis
| Parameter | Default | Description |
|---|---|---|
| Max_formant_hz | 5500 | Maximum formant frequency (Praat setting). |
| Number_of_formants | 5 | Number of formants to track (only F1‑F3 are used). |
Visualization (Praat picture)
When Draw_visualization = 1, the script draws a multi‑panel plot:
- Original and output waveforms – for comparison.
- Original and output spectrograms (0–5 kHz).
- Formant bar panel – horizontal bars showing mean F1 (red), F2 (blue), F3 (teal) across all analysed grains. Bar length = formant frequency in Hz.
- Mode colour strip – coloured bar with the selected mode name.
- Summary panel with:
- Grains analysed / scheduled events / number of resonance clusters.
- Voiced ratio, mean F1/F2/F3.
- All parameter settings (attraction, repulsions, pan, pitch drift, seed).
- RMS in / RMS out – note that output RMS is channel‑balanced and matched to source.
FAQ / troubleshooting
Install: pip install numpy soundfile. On Windows, the script tries py, python, python3.
The sound may be too short (need at least one full grain). Reduce Grain_length_ms or increase overlap to generate more grains.
The engine applies per‑channel RMS balance (to avoid one channel being louder) then scales the combined output to match the source RMS. If the source has very low‑level passages, the active‑region RMS may be higher – but the script uses the whole‑file RMS, which is safe. If the output sounds too quiet, you can increase gain manually after import.
Pitch drift is implemented as varispeed resampling (duration changes). Because grains are very short, a 0.5 st shift is subtle. Increase to 2–3 st for noticeable variation. Very large shifts may cause unnatural transients – use with care.
Pan is derived from the grain’s x‑coordinate in the 2‑D swarm projection. pan = tanh(0.9*x + noise). Multiplying by pan_spread can push some grains beyond ±1, which is still legal – it just makes them louder in one channel. Use values 0–1 for normal width, >1 for exaggerated separation.
The k‑means clustering uses a fixed k=6 (maximum distinct resonance families). This is a design choice – six formant‑based classes are enough to capture most vowel diversity without over‑fragmenting.
Command‑line usage of formant_swarm_granulator.py
The Python engine can be run independently (batch processing):
Options: --mode, --density, --attraction, --temporal_repulsion, --density_repulsion, --pan_spread, --pitch_drift, --seed.