Matter Gesture Bridge — Stochastic Timbral Plastic

Structural cross-synthesis audio effect. Animate a long "Matter" audio file using the intensity, pitch, brightness, and formant trajectory of a short "Gesture" sound through stochastic diffusion-style prior — a spectral terrain that flows like plastic.

Author: Shai Cohen Affiliation: Department of Music, Bar-Ilan University, Israel Version: 1.1 (2026) License: MIT License Repo: https://github.com/ShaiCohen-ops/Praat-plugin_AudioTools

Contents:

What this does Quick start 7 Presets Core Technique Parameters Applications

What this does

This script implements structural cross-synthesis audio effect — it takes a short selected Sound (the "Gesture") and a long external audio file (the "Matter"), then animates the Matter's timbral substance using the Gesture's motion. The Gesture's intensity, pitch contour, brightness, and formant trajectories drive a stochastic patch-selection process over the Matter's spectral frames. The result is a new Sound where the Matter's texture flows like plastic, sculpted by the Gesture's shape.

What is "timbral plastic"? The Matter sound is sliced into spectral patches (STFT frames). For each output frame, the best-matching Matter patch is selected based on gesture-driven target descriptors (intensity → RMS, pitch/brightness → spectral centroid). Stochastic modulation (intensity roughness, pitch noise, formant injection, liminal freeze) then reshapes the spectral terrain. The result is a sound that "melts" from one timbral state to another, guided by the Gesture's energy. It's neither purely granular nor purely filtering — it's a diffusion-style prior over the Matter's spectral landscape.

Key Features:

7 Presets — Crystalline Trace, Liminal Cloud, Ghost Matter, Volatile Gesture, Deep Freeze, Spectral Breath, Custom
Gesture-driven frame selection — RMS (intensity) and centroid (brightness/pitch) guide which Matter spectral frames to use
Stochastic modulation passes — intensity roughness (multiplicative noise), pitch noise (spectral bin shifts), formant injection (Gaussian resonances), liminal freeze (blend toward mean spectrum + chaos)
Vectorised Python engine (v1.2) — batched STFT, vectorised distance, multithreaded FFT (scipy), Griffin-Lim with phasor projection
Amplitude envelope gate — Gesture's RMS envelope applied with hard gate (silent where Gesture silent)
Cache system — Matter library saved to disk for fast reuse across multiple runs
Visualisation — 8×8 Praat picture: Gesture/Result waveforms, spectrograms, gesture controls (intensity, pitch, formant F1), summary panel

Technical pipeline (Python v1.2): (1) Gesture controls extracted in Praat (intensity, pitch, formants, brightness). (2) Python loads Matter sound, builds STFT patch library (real magnitudes, N_FFT=2048, hop=512). (3) For each output frame (aligned to Gesture duration), compute target RMS and centroid, find nearest Matter patch (L1 distance + jitter). (4) Apply modulation: intensity_roughness (log-normal noise), pitch_noise (circular bin shifts), formant_injection (Gaussian boosts), liminal_freeze (blend toward mean spectrum). (5) Griffin-Lim ISTFT with phasor projection (magnitude-preserving phase update). (6) Apply Gesture amplitude envelope + hard gate. (7) Return WAV to Praat.

Quick start

In Praat, select exactly one Sound object (the Gesture — short sound, 1–30 seconds).
Run script… → MatterGestureBridge.praat.
Select a Matter Sound file (long audio, 5–10 minutes recommended — the timbral source).
Choose a preset (Crystalline Trace, Liminal Cloud, Ghost Matter, Volatile Gesture, Deep Freeze, Spectral Breath, or Custom).
Adjust parameters: Freeze_time, Gesture_amount, Intensity_roughness, Pitch_noise, Formant_injection, Chaos, etc.
Click OK — script exports Gesture WAV, writes config JSON, launches Python engine (may take 1–5 minutes).
Result automatically imported as GestureName_MGB Sound object.

Quick tip: For crystalline, frozen textures, use Deep Freeze (freeze=0.05, chaos=0.05). For volatile, unstable textures, use Volatile Gesture (gesture_amount=0.95, roughness=0.9, chaos=0.75). For ghostly ambience, use Ghost Matter (freeze=0.9, chaos=0.85). Enable Reuse_cache to speed up subsequent runs with the same Matter file.

Important: Python dependencies required — install with: pip install numpy soundfile. Optional but recommended for speed: pip install librosa scipy (librosa for high-quality resampling, scipy for multithreaded FFT). The Matter file can be any common audio format (WAV, FLAC, MP3, AIFF). Processing time depends on Matter length and diffusion_steps (64–96 iterations). Large Matter files (10+ minutes) may take 5–10 minutes. The script is fully deterministic given the same random_seed.

7 Presets

Preset	Freeze	Gesture Amt	Roughness	Pitch Noise	Formant Inj	Chaos	GL Iters	Character
Crystalline Trace	0.10	0.80	0.30	0.25	0.60	0.15	80	Clear, frozen, glassy — Matter crystallised by Gesture轮廓
Liminal Cloud	0.55	0.65	0.65	0.50	0.45	0.50	64	Balanced, hazy, cloud-like threshold state
Ghost Matter	0.90	0.40	0.85	0.70	0.20	0.85	48	Ethereal, barely structured — ghostly residue
Volatile Gesture	0.35	0.95	0.90	0.80	0.35	0.75	64	Unstable, reactive — Gesture dominates aggressively
Deep Freeze	0.05	0.50	0.15	0.10	0.80	0.05	96	Extreme crystallisation — spectral peaks preserved
Spectral Breath	0.60	0.70	0.50	0.60	0.55	0.40	64	Breathy, spectral diffusion — organic
Custom — full manual control

Freeze time continuum: 0.0 = crystallized (spectrum blends toward Matter's mean). 0.55 = liminal cloud (partial blend + noise). 0.95 = ghost matter (high noise, barely structured). The freeze parameter controls how much the diffused spectrum is "melted" versus "frozen".

Core Technique — Stochastic Timbral Plastic

Spectral patch library

Matter sound → STFT (N_FFT=2048, hop=512) → magnitude spectrogram M(f, t_matter).
For each Matter frame: RMSₘ = √(∑ M(f,t)²), centroidₘ = ∑ f·M(f,t) / ∑ M(f,t).

Gesture conditioning

Gesture → intensity I(t), pitch P(t), formants F₁–F₄(t).
Target RMS for output frame: target_rms = RMS_min + I_norm × (RMS_max - RMS_min)
Target centroid: target_cen = centroid_min + P_norm × (centroid_max - centroid_min) (voiced) or mean centroid (unvoiced).

Frame selection

Cost(m, t) = w_rms·|norm(RMSₘ) - norm(target_rms)| + w_cen·|norm(centroidₘ) - norm(target_cen)| + jitter
Selected frame = argminₘ Cost(m, t) → output magnitude spectrum.

Modulation passes (linear magnitude space)

Intensity roughness: M ← M · exp(σ·𝒩(0,1)), σ = roughness·intensity_norm·0.4
Pitch noise: Per-frame circular bin shift, shift amount ∝ |dP/dt|·pitch_noise·n_freq·0.04
Formant injection: M[:,t] ← M[:,t] + formant_injection·boost(f)·mean(M[:,t]), boost(f) = exp(-0.5·((f - Fₖ)/bw)²)
Liminal freeze: M ← (1 - freeze)·M_mean + freeze·M·noise, noise = exp(𝒩(0, freeze·chaos·0.8))

Griffin-Lim ISTFT (phasor projection):
At each iteration: S = mag · (S_prev / |S_prev|). This keeps phase consistent while projecting onto the desired magnitude spectrum — far faster than recomputing angle() each iteration.

Performance optimisations in v1.2: The Python engine was completely vectorised — per-frame loops replaced with broadcast array operations, batched STFT (single FFT call via sliding_window_view), and multithreaded FFT via scipy.fft (workers=-1). The result is 10–50× faster than earlier versions, while preserving exact RNG stream consumption for reproducibility. The Griffin-Lim phasor projection replaces angle()→exp(1j·phase) with magnitude projection, saving two transcendental passes per iteration per frame.

Parameters

Rendering & analysis

Parameter	Range	Default	Description
Target_sample_rate	22050–96000	44100	Target sample rate for processing (higher = better quality, slower)
Training_excerpt_limit_sec	10–600	420	How much of the Matter file to use (seconds from start)
Patch_length_sec	0.5–5.0	1.5	FFT window length — larger = better frequency resolution
Model_epochs (not used)	—	8	Reserved for future use
Diffusion_steps	16–128	64	Griffin-Lim iterations — more = better phase reconstruction

Modulation parameters

Parameter	Range	Description
Freeze_time	0.0–0.95	0.0 = crystallised (toward mean spectrum), 0.95 = ghost matter (high noise, no structure)
Gesture_amount	0.0–1.0	Strength of gesture conditioning on frame selection. Lower = more random Matter patches.
Intensity_roughness	0.0–1.0	Multiplicative spectral noise. High = unstable / noisy texture.
Pitch_noise	0.0–1.0	Spectral bin shift amount — high values cause timbral fracture.
Formant_injection	0.0–1.0	Boost energy at F1–F4 frequencies. High = emphasised formants.
Chaos	0.0–1.0	Balances freeze vs. random noise. 0 = deterministic freeze, 1 = chaotic.

Amplitude gate: The final amplitude envelope is derived from the Gesture's RMS, with a hard gate threshold (0.02× peak) so that truly silent passages in the Gesture produce silence in the output. This is what makes the output feel "animated by" the Gesture rather than just cross-synthesised.

Applications

Sound design / texture generation

Use case: Create evolving, organic textures from a long drone or field recording, using a short rhythmic or melodic Gesture as a "mould".

Settings: Matter = 10 min wind recording, Gesture = 5 second melodic phrase. Liminal Cloud preset → output has the Matter's texture but breathes with the Gesture's contour.

Voice → instrumental morph

Use case: Sing a melody (Gesture) and animate a piano or synthesiser recording (Matter) to follow your phrasing.

Settings: Crystalline Trace preset (freeze=0.1, chaos=0.15). The output preserves the instrumental timbre but applies your voice's amplitude and pitch contour.

Ghostly ambience / installation

Use case: Long-term evolving sound installation where a sparse Gesture (e.g., single breath, whisper) animates a massive Matter file.

Settings: Ghost Matter preset (freeze=0.9, chaos=0.85). Output barely retains structure — spectral ashes.

Film / video game procedural audio

Use case: Animate a "creature" sound (Matter) using the player's input (Gesture), but offline for cutscenes.

Settings: Volatile Gesture preset (gesture_amount=0.95, roughness=0.9). Gesture's sharp transients become spectral stutters in the output.

Workflow: Voice → Crystalline Pad

Gesture: Sung vowel (5 seconds).
Matter: 5-minute synth pad recording.
Settings: Deep Freeze preset (freeze=0.05, formant_injection=0.8).
Result: The pad takes on the vowel's formants while maintaining its own texture — a crystal-clear vocal pad with no vocoder artifacts.

Workflow: Drum loop → Textured beat

Gesture: Single drum loop (4 seconds).
Matter: Long field recording of rain (10 minutes).
Settings: Liminal Cloud preset (freeze=0.55, chaos=0.5).
Result: The rain's spectral texture is stamped with the drum loop's rhythm — a rhythmic, textural beat made of rain sounds.

Workflow: Speech → Ghostly whisper

Gesture: Spoken sentence (3 seconds).
Matter: Wind + distant rumble (10 minutes).
Settings: Ghost Matter preset (freeze=0.9, chaos=0.85, gesture_amount=0.4).
Result: The speech's amplitude envelope is preserved, but timbre is entirely replaced by ghostly, unstructured noise — like a whisper from a haunted landscape.

Troubleshooting:
• Processing is slow: Reduce Training_excerpt_limit_sec (use only the first 2–3 minutes of Matter). Reduce Diffusion_steps (32–48). Use Balanced or Fast speed mode (not exposed in this UI — edit script or resample Matter beforehand).
• Output is silent or very quiet: Check that Gesture has non-zero RMS. Increase Gesture_amount to 0.9–1.0. Disable gate by lowering gate_threshold in Python (advanced).
• Output sounds like random noise: Freeze_time too high (ghost matter). Reduce to 0.3–0.5. Also check that Matter file has rich spectral content (broadband, not pure sine).
• Output has clicks / glitches: Griffin-Lim may need more iterations (increase Diffusion_steps to 96–128). Also increase boundary fade length (script already applies Hann fade).
• Cache not working: Ensure Reuse_cache=1. Cache is stored in the same directory as the log file (plugin temp). If you change Matter file or training limit, cache is recomputed.

Visualisation (Suite 8×8)

When Draw_visualization is enabled, the script generates a Praat picture with:

Title bar — script name, Gesture name, preset, freeze/chaos/gesture_amount
Gesture waveform (grey) — original selected Sound
Result waveform (purple) — MGB output
Gesture spectrogram — shows Gesture's spectral content
Result spectrogram — shows how Matter's spectrum was reshaped
Gesture controls panel — intensity (grey), pitch (purple), formant F1 (green) over time
Summary panel — preset, frames, GL iters, cache hit, RMS, peak, Gesture stats

The controls panel is particularly useful for debugging: you can see exactly how the Gesture's intensity, pitch, and formants drive the selection process.