Matter Gesture Bridge — Stochastic Timbral Plastic

Structural cross-synthesis audio effect. Animate a long "Matter" audio file using the intensity, pitch, brightness, and formant trajectory of a short "Gesture" sound through stochastic diffusion-style prior — a spectral terrain that flows like plastic.

Author: Shai Cohen Affiliation: Department of Music, Bar-Ilan University, Israel Version: 1.1 (2026) License: MIT License Repo: https://github.com/ShaiCohen-ops/Praat-plugin_AudioTools
Contents:

What this does

This script implements structural cross-synthesis audio effect — it takes a short selected Sound (the "Gesture") and a long external audio file (the "Matter"), then animates the Matter's timbral substance using the Gesture's motion. The Gesture's intensity, pitch contour, brightness, and formant trajectories drive a stochastic patch-selection process over the Matter's spectral frames. The result is a new Sound where the Matter's texture flows like plastic, sculpted by the Gesture's shape.

What is "timbral plastic"? The Matter sound is sliced into spectral patches (STFT frames). For each output frame, the best-matching Matter patch is selected based on gesture-driven target descriptors (intensity → RMS, pitch/brightness → spectral centroid). Stochastic modulation (intensity roughness, pitch noise, formant injection, liminal freeze) then reshapes the spectral terrain. The result is a sound that "melts" from one timbral state to another, guided by the Gesture's energy. It's neither purely granular nor purely filtering — it's a diffusion-style prior over the Matter's spectral landscape.

Key Features:

Technical pipeline (Python v1.2): (1) Gesture controls extracted in Praat (intensity, pitch, formants, brightness). (2) Python loads Matter sound, builds STFT patch library (real magnitudes, N_FFT=2048, hop=512). (3) For each output frame (aligned to Gesture duration), compute target RMS and centroid, find nearest Matter patch (L1 distance + jitter). (4) Apply modulation: intensity_roughness (log-normal noise), pitch_noise (circular bin shifts), formant_injection (Gaussian boosts), liminal_freeze (blend toward mean spectrum). (5) Griffin-Lim ISTFT with phasor projection (magnitude-preserving phase update). (6) Apply Gesture amplitude envelope + hard gate. (7) Return WAV to Praat.

Quick start

  1. In Praat, select exactly one Sound object (the Gesture — short sound, 1–30 seconds).
  2. Run script…MatterGestureBridge.praat.
  3. Select a Matter Sound file (long audio, 5–10 minutes recommended — the timbral source).
  4. Choose a preset (Crystalline Trace, Liminal Cloud, Ghost Matter, Volatile Gesture, Deep Freeze, Spectral Breath, or Custom).
  5. Adjust parameters: Freeze_time, Gesture_amount, Intensity_roughness, Pitch_noise, Formant_injection, Chaos, etc.
  6. Click OK — script exports Gesture WAV, writes config JSON, launches Python engine (may take 1–5 minutes).
  7. Result automatically imported as GestureName_MGB Sound object.
Quick tip: For crystalline, frozen textures, use Deep Freeze (freeze=0.05, chaos=0.05). For volatile, unstable textures, use Volatile Gesture (gesture_amount=0.95, roughness=0.9, chaos=0.75). For ghostly ambience, use Ghost Matter (freeze=0.9, chaos=0.85). Enable Reuse_cache to speed up subsequent runs with the same Matter file.
Important: Python dependencies required — install with: pip install numpy soundfile. Optional but recommended for speed: pip install librosa scipy (librosa for high-quality resampling, scipy for multithreaded FFT). The Matter file can be any common audio format (WAV, FLAC, MP3, AIFF). Processing time depends on Matter length and diffusion_steps (64–96 iterations). Large Matter files (10+ minutes) may take 5–10 minutes. The script is fully deterministic given the same random_seed.

7 Presets

PresetFreezeGesture AmtRoughnessPitch NoiseFormant InjChaosGL ItersCharacter
Crystalline Trace0.100.800.300.250.600.1580Clear, frozen, glassy — Matter crystallised by Gesture轮廓
Liminal Cloud0.550.650.650.500.450.5064Balanced, hazy, cloud-like threshold state
Ghost Matter0.900.400.850.700.200.8548Ethereal, barely structured — ghostly residue
Volatile Gesture0.350.950.900.800.350.7564Unstable, reactive — Gesture dominates aggressively
Deep Freeze0.050.500.150.100.800.0596Extreme crystallisation — spectral peaks preserved
Spectral Breath0.600.700.500.600.550.4064Breathy, spectral diffusion — organic
Custom — full manual control
Freeze time continuum: 0.0 = crystallized (spectrum blends toward Matter's mean). 0.55 = liminal cloud (partial blend + noise). 0.95 = ghost matter (high noise, barely structured). The freeze parameter controls how much the diffused spectrum is "melted" versus "frozen".

Core Technique — Stochastic Timbral Plastic

Spectral patch library

Matter sound → STFT (N_FFT=2048, hop=512) → magnitude spectrogram M(f, t_matter).
For each Matter frame: RMSₘ = √(∑ M(f,t)²), centroidₘ = ∑ f·M(f,t) / ∑ M(f,t).

Gesture conditioning

Gesture → intensity I(t), pitch P(t), formants F₁–F₄(t).
Target RMS for output frame: target_rms = RMS_min + I_norm × (RMS_max - RMS_min)
Target centroid: target_cen = centroid_min + P_norm × (centroid_max - centroid_min) (voiced) or mean centroid (unvoiced).

Frame selection

Cost(m, t) = w_rms·|norm(RMSₘ) - norm(target_rms)| + w_cen·|norm(centroidₘ) - norm(target_cen)| + jitter
Selected frame = argminₘ Cost(m, t) → output magnitude spectrum.

Modulation passes (linear magnitude space)

Intensity roughness: M ← M · exp(σ·𝒩(0,1)), σ = roughness·intensity_norm·0.4
Pitch noise: Per-frame circular bin shift, shift amount ∝ |dP/dt|·pitch_noise·n_freq·0.04
Formant injection: M[:,t] ← M[:,t] + formant_injection·boost(f)·mean(M[:,t]), boost(f) = exp(-0.5·((f - Fₖ)/bw)²)
Liminal freeze: M ← (1 - freeze)·M_mean + freeze·M·noise, noise = exp(𝒩(0, freeze·chaos·0.8))

Griffin-Lim ISTFT (phasor projection):
At each iteration: S = mag · (S_prev / |S_prev|). This keeps phase consistent while projecting onto the desired magnitude spectrum — far faster than recomputing angle() each iteration.
Performance optimisations in v1.2: The Python engine was completely vectorised — per-frame loops replaced with broadcast array operations, batched STFT (single FFT call via sliding_window_view), and multithreaded FFT via scipy.fft (workers=-1). The result is 10–50× faster than earlier versions, while preserving exact RNG stream consumption for reproducibility. The Griffin-Lim phasor projection replaces angle()→exp(1j·phase) with magnitude projection, saving two transcendental passes per iteration per frame.

Parameters

Rendering & analysis

ParameterRangeDefaultDescription
Target_sample_rate22050–9600044100Target sample rate for processing (higher = better quality, slower)
Training_excerpt_limit_sec10–600420How much of the Matter file to use (seconds from start)
Patch_length_sec0.5–5.01.5FFT window length — larger = better frequency resolution
Model_epochs (not used)8Reserved for future use
Diffusion_steps16–12864Griffin-Lim iterations — more = better phase reconstruction

Modulation parameters

ParameterRangeDescription
Freeze_time0.0–0.950.0 = crystallised (toward mean spectrum), 0.95 = ghost matter (high noise, no structure)
Gesture_amount0.0–1.0Strength of gesture conditioning on frame selection. Lower = more random Matter patches.
Intensity_roughness0.0–1.0Multiplicative spectral noise. High = unstable / noisy texture.
Pitch_noise0.0–1.0Spectral bin shift amount — high values cause timbral fracture.
Formant_injection0.0–1.0Boost energy at F1–F4 frequencies. High = emphasised formants.
Chaos0.0–1.0Balances freeze vs. random noise. 0 = deterministic freeze, 1 = chaotic.
Amplitude gate: The final amplitude envelope is derived from the Gesture's RMS, with a hard gate threshold (0.02× peak) so that truly silent passages in the Gesture produce silence in the output. This is what makes the output feel "animated by" the Gesture rather than just cross-synthesised.

Applications

Sound design / texture generation

Use case: Create evolving, organic textures from a long drone or field recording, using a short rhythmic or melodic Gesture as a "mould".

Settings: Matter = 10 min wind recording, Gesture = 5 second melodic phrase. Liminal Cloud preset → output has the Matter's texture but breathes with the Gesture's contour.

Voice → instrumental morph

Use case: Sing a melody (Gesture) and animate a piano or synthesiser recording (Matter) to follow your phrasing.

Settings: Crystalline Trace preset (freeze=0.1, chaos=0.15). The output preserves the instrumental timbre but applies your voice's amplitude and pitch contour.

Ghostly ambience / installation

Use case: Long-term evolving sound installation where a sparse Gesture (e.g., single breath, whisper) animates a massive Matter file.

Settings: Ghost Matter preset (freeze=0.9, chaos=0.85). Output barely retains structure — spectral ashes.

Film / video game procedural audio

Use case: Animate a "creature" sound (Matter) using the player's input (Gesture), but offline for cutscenes.

Settings: Volatile Gesture preset (gesture_amount=0.95, roughness=0.9). Gesture's sharp transients become spectral stutters in the output.

Workflow: Voice → Crystalline Pad

Gesture: Sung vowel (5 seconds).
Matter: 5-minute synth pad recording.
Settings: Deep Freeze preset (freeze=0.05, formant_injection=0.8).
Result: The pad takes on the vowel's formants while maintaining its own texture — a crystal-clear vocal pad with no vocoder artifacts.

Workflow: Drum loop → Textured beat

Gesture: Single drum loop (4 seconds).
Matter: Long field recording of rain (10 minutes).
Settings: Liminal Cloud preset (freeze=0.55, chaos=0.5).
Result: The rain's spectral texture is stamped with the drum loop's rhythm — a rhythmic, textural beat made of rain sounds.

Workflow: Speech → Ghostly whisper

Gesture: Spoken sentence (3 seconds).
Matter: Wind + distant rumble (10 minutes).
Settings: Ghost Matter preset (freeze=0.9, chaos=0.85, gesture_amount=0.4).
Result: The speech's amplitude envelope is preserved, but timbre is entirely replaced by ghostly, unstructured noise — like a whisper from a haunted landscape.

Troubleshooting:
Processing is slow: Reduce Training_excerpt_limit_sec (use only the first 2–3 minutes of Matter). Reduce Diffusion_steps (32–48). Use Balanced or Fast speed mode (not exposed in this UI — edit script or resample Matter beforehand).
Output is silent or very quiet: Check that Gesture has non-zero RMS. Increase Gesture_amount to 0.9–1.0. Disable gate by lowering gate_threshold in Python (advanced).
Output sounds like random noise: Freeze_time too high (ghost matter). Reduce to 0.3–0.5. Also check that Matter file has rich spectral content (broadband, not pure sine).
Output has clicks / glitches: Griffin-Lim may need more iterations (increase Diffusion_steps to 96–128). Also increase boundary fade length (script already applies Hann fade).
Cache not working: Ensure Reuse_cache=1. Cache is stored in the same directory as the log file (plugin temp). If you change Matter file or training limit, cache is recomputed.

Visualisation (Suite 8×8)

When Draw_visualization is enabled, the script generates a Praat picture with:
  • Title bar — script name, Gesture name, preset, freeze/chaos/gesture_amount
  • Gesture waveform (grey) — original selected Sound
  • Result waveform (purple) — MGB output
  • Gesture spectrogram — shows Gesture's spectral content
  • Result spectrogram — shows how Matter's spectrum was reshaped
  • Gesture controls panel — intensity (grey), pitch (purple), formant F1 (green) over time
  • Summary panel — preset, frames, GL iters, cache hit, RMS, peak, Gesture stats
The controls panel is particularly useful for debugging: you can see exactly how the Gesture's intensity, pitch, and formants drive the selection process.