Matter Gesture Bridge — Stochastic Timbral Plastic
Structural cross-synthesis audio effect. Animate a long "Matter" audio file using the intensity, pitch, brightness, and formant trajectory of a short "Gesture" sound through stochastic diffusion-style prior — a spectral terrain that flows like plastic.
What this does
This script implements structural cross-synthesis audio effect — it takes a short selected Sound (the "Gesture") and a long external audio file (the "Matter"), then animates the Matter's timbral substance using the Gesture's motion. The Gesture's intensity, pitch contour, brightness, and formant trajectories drive a stochastic patch-selection process over the Matter's spectral frames. The result is a new Sound where the Matter's texture flows like plastic, sculpted by the Gesture's shape.
Key Features:
- 7 Presets — Crystalline Trace, Liminal Cloud, Ghost Matter, Volatile Gesture, Deep Freeze, Spectral Breath, Custom
- Gesture-driven frame selection — RMS (intensity) and centroid (brightness/pitch) guide which Matter spectral frames to use
- Stochastic modulation passes — intensity roughness (multiplicative noise), pitch noise (spectral bin shifts), formant injection (Gaussian resonances), liminal freeze (blend toward mean spectrum + chaos)
- Vectorised Python engine (v1.2) — batched STFT, vectorised distance, multithreaded FFT (scipy), Griffin-Lim with phasor projection
- Amplitude envelope gate — Gesture's RMS envelope applied with hard gate (silent where Gesture silent)
- Cache system — Matter library saved to disk for fast reuse across multiple runs
- Visualisation — 8×8 Praat picture: Gesture/Result waveforms, spectrograms, gesture controls (intensity, pitch, formant F1), summary panel
Quick start
- In Praat, select exactly one Sound object (the Gesture — short sound, 1–30 seconds).
- Run script… →
MatterGestureBridge.praat. - Select a Matter Sound file (long audio, 5–10 minutes recommended — the timbral source).
- Choose a preset (Crystalline Trace, Liminal Cloud, Ghost Matter, Volatile Gesture, Deep Freeze, Spectral Breath, or Custom).
- Adjust parameters: Freeze_time, Gesture_amount, Intensity_roughness, Pitch_noise, Formant_injection, Chaos, etc.
- Click OK — script exports Gesture WAV, writes config JSON, launches Python engine (may take 1–5 minutes).
- Result automatically imported as
GestureName_MGBSound object.
pip install numpy soundfile. Optional but recommended for speed: pip install librosa scipy (librosa for high-quality resampling, scipy for multithreaded FFT). The Matter file can be any common audio format (WAV, FLAC, MP3, AIFF). Processing time depends on Matter length and diffusion_steps (64–96 iterations). Large Matter files (10+ minutes) may take 5–10 minutes. The script is fully deterministic given the same random_seed.
7 Presets
| Preset | Freeze | Gesture Amt | Roughness | Pitch Noise | Formant Inj | Chaos | GL Iters | Character |
|---|---|---|---|---|---|---|---|---|
| Crystalline Trace | 0.10 | 0.80 | 0.30 | 0.25 | 0.60 | 0.15 | 80 | Clear, frozen, glassy — Matter crystallised by Gesture轮廓 |
| Liminal Cloud | 0.55 | 0.65 | 0.65 | 0.50 | 0.45 | 0.50 | 64 | Balanced, hazy, cloud-like threshold state |
| Ghost Matter | 0.90 | 0.40 | 0.85 | 0.70 | 0.20 | 0.85 | 48 | Ethereal, barely structured — ghostly residue |
| Volatile Gesture | 0.35 | 0.95 | 0.90 | 0.80 | 0.35 | 0.75 | 64 | Unstable, reactive — Gesture dominates aggressively |
| Deep Freeze | 0.05 | 0.50 | 0.15 | 0.10 | 0.80 | 0.05 | 96 | Extreme crystallisation — spectral peaks preserved |
| Spectral Breath | 0.60 | 0.70 | 0.50 | 0.60 | 0.55 | 0.40 | 64 | Breathy, spectral diffusion — organic |
| Custom — full manual control | ||||||||
Core Technique — Stochastic Timbral Plastic
Spectral patch library
Matter sound → STFT (N_FFT=2048, hop=512) → magnitude spectrogram M(f, t_matter).
For each Matter frame: RMSₘ = √(∑ M(f,t)²), centroidₘ = ∑ f·M(f,t) / ∑ M(f,t).
Gesture conditioning
Gesture → intensity I(t), pitch P(t), formants F₁–F₄(t).
Target RMS for output frame: target_rms = RMS_min + I_norm × (RMS_max - RMS_min)
Target centroid: target_cen = centroid_min + P_norm × (centroid_max - centroid_min) (voiced) or mean centroid (unvoiced).
Frame selection
Cost(m, t) = w_rms·|norm(RMSₘ) - norm(target_rms)| + w_cen·|norm(centroidₘ) - norm(target_cen)| + jitter
Selected frame = argminₘ Cost(m, t) → output magnitude spectrum.
Modulation passes (linear magnitude space)
Intensity roughness: M ← M · exp(σ·𝒩(0,1)), σ = roughness·intensity_norm·0.4
Pitch noise: Per-frame circular bin shift, shift amount ∝ |dP/dt|·pitch_noise·n_freq·0.04
Formant injection: M[:,t] ← M[:,t] + formant_injection·boost(f)·mean(M[:,t]), boost(f) = exp(-0.5·((f - Fₖ)/bw)²)
Liminal freeze: M ← (1 - freeze)·M_mean + freeze·M·noise, noise = exp(𝒩(0, freeze·chaos·0.8))
At each iteration: S = mag · (S_prev / |S_prev|). This keeps phase consistent while projecting onto the desired magnitude spectrum — far faster than recomputing angle() each iteration.
Parameters
Rendering & analysis
| Parameter | Range | Default | Description |
|---|---|---|---|
| Target_sample_rate | 22050–96000 | 44100 | Target sample rate for processing (higher = better quality, slower) |
| Training_excerpt_limit_sec | 10–600 | 420 | How much of the Matter file to use (seconds from start) |
| Patch_length_sec | 0.5–5.0 | 1.5 | FFT window length — larger = better frequency resolution |
| Model_epochs (not used) | — | 8 | Reserved for future use |
| Diffusion_steps | 16–128 | 64 | Griffin-Lim iterations — more = better phase reconstruction |
Modulation parameters
| Parameter | Range | Description |
|---|---|---|
| Freeze_time | 0.0–0.95 | 0.0 = crystallised (toward mean spectrum), 0.95 = ghost matter (high noise, no structure) |
| Gesture_amount | 0.0–1.0 | Strength of gesture conditioning on frame selection. Lower = more random Matter patches. |
| Intensity_roughness | 0.0–1.0 | Multiplicative spectral noise. High = unstable / noisy texture. |
| Pitch_noise | 0.0–1.0 | Spectral bin shift amount — high values cause timbral fracture. |
| Formant_injection | 0.0–1.0 | Boost energy at F1–F4 frequencies. High = emphasised formants. |
| Chaos | 0.0–1.0 | Balances freeze vs. random noise. 0 = deterministic freeze, 1 = chaotic. |
Applications
Sound design / texture generation
Use case: Create evolving, organic textures from a long drone or field recording, using a short rhythmic or melodic Gesture as a "mould".
Settings: Matter = 10 min wind recording, Gesture = 5 second melodic phrase. Liminal Cloud preset → output has the Matter's texture but breathes with the Gesture's contour.
Voice → instrumental morph
Use case: Sing a melody (Gesture) and animate a piano or synthesiser recording (Matter) to follow your phrasing.
Settings: Crystalline Trace preset (freeze=0.1, chaos=0.15). The output preserves the instrumental timbre but applies your voice's amplitude and pitch contour.
Ghostly ambience / installation
Use case: Long-term evolving sound installation where a sparse Gesture (e.g., single breath, whisper) animates a massive Matter file.
Settings: Ghost Matter preset (freeze=0.9, chaos=0.85). Output barely retains structure — spectral ashes.
Film / video game procedural audio
Use case: Animate a "creature" sound (Matter) using the player's input (Gesture), but offline for cutscenes.
Settings: Volatile Gesture preset (gesture_amount=0.95, roughness=0.9). Gesture's sharp transients become spectral stutters in the output.
Workflow: Voice → Crystalline Pad
Gesture: Sung vowel (5 seconds).
Matter: 5-minute synth pad recording.
Settings: Deep Freeze preset (freeze=0.05, formant_injection=0.8).
Result: The pad takes on the vowel's formants while maintaining its own texture — a crystal-clear vocal pad with no vocoder artifacts.
Workflow: Drum loop → Textured beat
Gesture: Single drum loop (4 seconds).
Matter: Long field recording of rain (10 minutes).
Settings: Liminal Cloud preset (freeze=0.55, chaos=0.5).
Result: The rain's spectral texture is stamped with the drum loop's rhythm — a rhythmic, textural beat made of rain sounds.
Workflow: Speech → Ghostly whisper
Gesture: Spoken sentence (3 seconds).
Matter: Wind + distant rumble (10 minutes).
Settings: Ghost Matter preset (freeze=0.9, chaos=0.85, gesture_amount=0.4).
Result: The speech's amplitude envelope is preserved, but timbre is entirely replaced by ghostly, unstructured noise — like a whisper from a haunted landscape.
• Processing is slow: Reduce Training_excerpt_limit_sec (use only the first 2–3 minutes of Matter). Reduce Diffusion_steps (32–48). Use Balanced or Fast speed mode (not exposed in this UI — edit script or resample Matter beforehand).
• Output is silent or very quiet: Check that Gesture has non-zero RMS. Increase Gesture_amount to 0.9–1.0. Disable gate by lowering gate_threshold in Python (advanced).
• Output sounds like random noise: Freeze_time too high (ghost matter). Reduce to 0.3–0.5. Also check that Matter file has rich spectral content (broadband, not pure sine).
• Output has clicks / glitches: Griffin-Lim may need more iterations (increase Diffusion_steps to 96–128). Also increase boundary fade length (script already applies Hann fade).
• Cache not working: Ensure Reuse_cache=1. Cache is stored in the same directory as the log file (plugin temp). If you change Matter file or training limit, cache is recomputed.
Visualisation (Suite 8×8)
- Title bar — script name, Gesture name, preset, freeze/chaos/gesture_amount
- Gesture waveform (grey) — original selected Sound
- Result waveform (purple) — MGB output
- Gesture spectrogram — shows Gesture's spectral content
- Result spectrogram — shows how Matter's spectrum was reshaped
- Gesture controls panel — intensity (grey), pitch (purple), formant F1 (green) over time
- Summary panel — preset, frames, GL iters, cache hit, RMS, peak, Gesture stats