Beltrami-Inspired Spectral Melter — Anisotropic Diffusion
Treats the spectrogram as a 3D acoustic terrain and applies anisotropic diffusion (Perona-Malik / Beltrami-inspired) that flows freely across smooth spectral plains but is blocked at ridges — attacks, formant edges, note boundaries. The diffused matrix encodes a reshaped spectral terrain.
What this does
This script implements anisotropic spectral diffusion inspired by Beltrami flow and Perona-Malik image denoising. The spectrogram is interpreted as a 3D terrain: X = time frame, Y = frequency bin, Z = log energy. Anisotropic diffusion smooths the terrain along smooth plains (homogeneous regions) while preserving sharp edges (spectral ridges, transients, formants). The diffused matrix is then resynthesised via overlap-add FFT, with optional stereo phase randomisation (Paulstretch-style) and exaggerated spectral shaping to make the effect perceptually dramatic.
ridge_sensitivity and edge_preservation. The result is a "melted" spectrogram where sustained tones smear into ambient textures while percussive attacks retain their shape.
Key Features:
- 8 Presets — Shimmer Haze, Deep Terrain, Edge Freeze, Fog of War, Formant Cloud, Transient Glass, Void Chasm, Custom
- Anisotropic diffusion — Perona-Malik flux function, separate time/frequency diffusion coefficients
- Gradient magnitude — Combined time and frequency gradients, computed via central differences
- Exaggerated spectral shaping — effect_strength amplifies deviations from per-frame mean amplitude
- Stereo mode (Paulstretch pattern) — independent random phases per channel, stereo_phase_offset decorrelates L/R
- Speed modes — Full quality (original SR), Balanced (22 kHz), Fast (11 kHz)
- Overlap-add resynthesis — Paulstretch-style OLA with Hann window, micro-fades, and norm buffer compensation
- Visualisation — Input/output waveforms, original vs. melted spectrograms, summary panel
Quick start
- In Praat, select exactly one Sound object.
- Run script… →
Beltrami_Inspired_Spectral_Melter.praat. - Choose a preset from the dropdown (Shimmer Haze, Deep Terrain, Edge Freeze, Fog of War, Formant Cloud, Transient Glass, Void Chasm, or Custom).
- Adjust analysis parameters (window, time step, frequency resolution) and diffusion parameters (iterations, time/freq diffusion rates, ridge sensitivity, edge preservation).
- Set Effect_strength — higher values exaggerate the diffused terrain (spectral peaks soar, valleys drop).
- Enable Stereo and set Stereo_phase_offset for Paulstretch-style stereo decorrelation.
- Select Speed mode (Full quality / Balanced / Fast).
- Click OK — script builds spectrogram, runs diffusion, resynthesises, outputs
originalname_BeltramiInspired_presetname.
dt_t and dt_f control how much information flows across each dimension. High dt values (near 0.24) can cause instability; the script clamps to 0.24. Effect_strength > 1 exaggerates the diffused shape, making subtle changes audible.
8 Presets
| Preset | Window (ms) | Iter | dt_t/dt_f | κ (sens/edge) | Effect | Character |
|---|---|---|---|---|---|---|
| Shimmer Haze | 40 | 8 | 0.20/0.05 | 2.08 | 2.5 | Gentle diffusion, shimmering ambient haze. |
| Deep Terrain | 60 | 12 | 0.20/0.18 | 1.50 | 4.0 | Aggressive smoothing, cavernous depth. |
| Edge Freeze | 30 | 5 | 0.10/0.08 | 1.75 | 2.0 | Preserves transients, freezes edges. |
| Fog of War | 50 | 10 | 0.12/0.22 | 1.67 | 4.0 | Strong frequency smearing, foggy texture. |
| Formant Cloud | 35 | 7 | 0.18/0.06 | 1.33 | 3.0 | Preserves formants, clouds harmonics. |
| Transient Glass | 25 | 6 | 0.22/0.04 | 1.60 | 3.5 | Glass-like, transient-heavy diffusion. |
| Void Chasm | 100 | 18 | 0.24/0.20 | 1.67 | 8.0 | Extreme melting, spectral abyss. |
| See script for full parameter details. | ||||||
Diffusion Theory: Perona-Malik / Beltrami-Inspired
Anisotropic diffusion PDE
∂u/∂t = div( g(|∇u|) · ∇u )
where u(t, x, y) is the log-energy spectrogram, g(·) is the edge-stopping function, and ∇u is the gradient magnitude.
Flux function (Perona-Malik): g(s) = exp(-(s/κ)²)
κ = ridge_sensitivity / edge_preservation (higher κ = more diffusion, lower κ = stronger edge preservation)
u ← u + dt_t·[flux_right·(u_right - u) + flux_left·(u_left - u)]
where flux_right = exp(-((grad_mag_mid + grad_mag_right)/2 / κ)²)
Frequency dimension: same formula with dt_f and vertical gradients. Combined gradient magnitude:
|∇u|² = (∂u/∂t)² + (∂u/∂f)²
computed via central differences in time and frequency, then sqrt().
Parameters
Analysis parameters
| Parameter | Range | Description |
|---|---|---|
| Window_size_ms | 10–200 | FFT window length. Larger = better frequency resolution, worse time resolution. |
| Time_step_ms | 2–50 | Hop size between analysis frames. Smaller = smoother diffusion, more frames. |
| Max_frequency_Hz | 100–Nyquist | Upper frequency limit. Lower = faster, focuses diffusion on low/mid frequencies. |
| Freq_resolution_Hz | 20–500 | Frequency bin width. Smaller = higher spectral resolution, more bins. |
| Dynamic_floor_dB | -100–-40 | Noise floor clamping in log-energy matrix. |
Diffusion parameters
| Parameter | Range | Description |
|---|---|---|
| Iterations | 1–50 | Number of diffusion passes. Higher = more melting. |
| Time_diffusion (dt_t) | 0.01–0.24 | Diffusion rate across time frames. Higher = more temporal smearing. |
| Freq_diffusion (dt_f) | 0.01–0.24 | Diffusion rate across frequency bins. Higher = spectral blurring. |
| Ridge_sensitivity | 0.5–5.0 | Scales the gradient threshold. Higher = more diffusion (edges ignored less). |
| Edge_preservation | 0.5–3.0 | Denominator in κ formula. Higher = stronger edge preservation. |
Effect & output
| Parameter | Range | Description |
|---|---|---|
| Effect_strength | 0–10 | Exaggerates diffused shape: new = mean + (diffused - mean) × strength. |
| Wet/dry mix | 0–1 | Blend of processed (wet) and original (dry). 1 = pure wet. |
| Create_stereo | yes/no | Generates independent random phases per channel (Paulstretch pattern). |
| Stereo_phase_offset | 0–1 | Right channel phase range multiplier (phaseScale = 1 + offset). |
| Speed_mode | Full/Balanced/Fast | Resamples to 44.1k (full), 22k, or 11k before analysis — dramatically speeds up processing. |
Applications
Ambient / Drone textures (Shimmer Haze, Deep Terrain)
Use case: Transform acoustic instruments or field recordings into smooth, evolving ambient textures.
Settings: Long window (60 ms), many iterations (12), moderate effect strength (2–4). The diffusion smears partials into sustained clouds.
Spectral edge preservation / freeze (Edge Freeze)
Use case: Keep transients and attacks sharp while melting the sustained portions.
Settings: High ridge_sensitivity (3.5), low dt_t/dt_f (0.10/0.08), few iterations (5). Transients remain intact; harmonics diffuse.
Formant / vowel transformation (Formant Cloud)
Use case: Alter vocal formants while preserving intelligibility.
Settings: Moderate frequency diffusion (0.06), low time diffusion (0.18), effect_strength = 3.0. Formant peaks are preserved (edge protection) but surrounding harmonics are smeared.
Extreme spectral melting (Void Chasm)
Use case: Experimental sound design — dissolve sounds into unrecognisable textures.
Settings: 100 ms window (very long), 18 iterations, dt_t=0.24/dt_f=0.20 (max diffusion), effect_strength=8.0. The result is a spectral abyss — tonal content becomes inharmonic noise-like.
Workflow: Piano → Shimmer Haze
Source: Solo piano recording.
Settings: Shimmer Haze preset, stereo enabled (offset=0.3).
Result: Piano notes begin with attack clarity, then melt into shimmering, diffuse haze. Stereo phase offset creates wide, enveloping texture.
Workflow: Voice → Void Chasm → Resample to very low rate
Source: Spoken word.
Settings: Void Chasm preset, then in Praat, resample output to 8 kHz.
Result: The already melted voice becomes almost unrecognisable — whispered, spectral ashes.
Workflow: Drum loop → Edge Freeze
Source: Drum loop (kick, snare, hi-hat).
Settings: Edge Freeze preset.
Result: Attacks (transients) preserved; sustain and release portions are smoothed. The rhythm remains intelligible, but the timbre becomes glassy/ethereal.
• Processing is very slow: Use Balanced or Fast speed modes. Reduce frequency resolution (larger bins) or increase time step (fewer frames).
• Output is silent or extremely quiet: Check that Max_frequency_Hz is within Nyquist (speed modes reduce sample rate). Void Chasm with fast mode may clamp due to Nyquist.
• Diffusion not audible: Increase Effect_strength (3–8) to exaggerate the diffused terrain. Also increase iterations and dt values.
• Clicks at end of file: The script applies micro-fades and OLA normalisation. If clicks persist, increase micro-fade duration (edit script).
• Stereo phase difference too extreme: Reduce stereo_phase_offset (0.1–0.3). Paulstretch pattern uses random phases — each run will differ.
Visualisation (Suite 8×8)
- Title bar — script name, preset, iterations, kappa, dt values, effect strength, wet mix.
- Input waveform (grey) — original sound.
- Output waveform (blue/orange for stereo) — Beltrami-melted result.
- Original spectrogram (left) — before diffusion.
- Output spectrogram (right) — after diffusion. Look for smearing across time/frequency.
- Summary panel — all parameters, frames/bins, render time, speed mode.