Beltrami-Inspired Spectral Melter — Anisotropic Diffusion

Treats the spectrogram as a 3D acoustic terrain and applies anisotropic diffusion (Perona-Malik / Beltrami-inspired) that flows freely across smooth spectral plains but is blocked at ridges — attacks, formant edges, note boundaries. The diffused matrix encodes a reshaped spectral terrain.

Author: Shai Cohen Affiliation: Department of Music, Bar-Ilan University, Israel Version: 2.2 (2026) License: MIT License Repo: https://github.com/ShaiCohen-ops/Praat-plugin_AudioTools

Contents:

What this does Quick start 8 Presets Diffusion Theory Parameters Applications

What this does

This script implements anisotropic spectral diffusion inspired by Beltrami flow and Perona-Malik image denoising. The spectrogram is interpreted as a 3D terrain: X = time frame, Y = frequency bin, Z = log energy. Anisotropic diffusion smooths the terrain along smooth plains (homogeneous regions) while preserving sharp edges (spectral ridges, transients, formants). The diffused matrix is then resynthesised via overlap-add FFT, with optional stereo phase randomisation (Paulstretch-style) and exaggerated spectral shaping to make the effect perceptually dramatic.

What is anisotropic diffusion? Unlike isotropic (uniform) blur, anisotropic diffusion uses a variable diffusion coefficient that depends on the local gradient magnitude. High gradients (edges) reduce diffusion, preserving boundaries. Low gradients (flat regions) allow diffusion, smoothing out noise and detail. In this script, diffusion flows across time (frames) and frequency (bins) independently, with edge sensitivity controlled by ridge_sensitivity and edge_preservation. The result is a "melted" spectrogram where sustained tones smear into ambient textures while percussive attacks retain their shape.

Key Features:

8 Presets — Shimmer Haze, Deep Terrain, Edge Freeze, Fog of War, Formant Cloud, Transient Glass, Void Chasm, Custom
Anisotropic diffusion — Perona-Malik flux function, separate time/frequency diffusion coefficients
Gradient magnitude — Combined time and frequency gradients, computed via central differences
Exaggerated spectral shaping — effect_strength amplifies deviations from per-frame mean amplitude
Stereo mode (Paulstretch pattern) — independent random phases per channel, stereo_phase_offset decorrelates L/R
Speed modes — Full quality (original SR), Balanced (22 kHz), Fast (11 kHz)
Overlap-add resynthesis — Paulstretch-style OLA with Hann window, micro-fades, and norm buffer compensation
Visualisation — Input/output waveforms, original vs. melted spectrograms, summary panel

Performance & v2.2 fixes: This script fixes three critical issues: (1) correct kappa formula (ridge_sensitivity/edge_preservation), (2) Nyquist clamping to prevent silent mis-binning in speed modes, (3) authoritative nBins from matrix dimensions, (4) OLA normalisation buffer to cancel Hann² windowing coloration. The algorithm is computationally heavy (iterative diffusion across matrices) but fully vectorised using Praat's native Formula.

Quick start

In Praat, select exactly one Sound object.
Run script… → Beltrami_Inspired_Spectral_Melter.praat.
Choose a preset from the dropdown (Shimmer Haze, Deep Terrain, Edge Freeze, Fog of War, Formant Cloud, Transient Glass, Void Chasm, or Custom).
Adjust analysis parameters (window, time step, frequency resolution) and diffusion parameters (iterations, time/freq diffusion rates, ridge sensitivity, edge preservation).
Set Effect_strength — higher values exaggerate the diffused terrain (spectral peaks soar, valleys drop).
Enable Stereo and set Stereo_phase_offset for Paulstretch-style stereo decorrelation.
Select Speed mode (Full quality / Balanced / Fast).
Click OK — script builds spectrogram, runs diffusion, resynthesises, outputs originalname_BeltramiInspired_presetname.

Quick tip: Start with Shimmer Haze (gentle diffusion, moderate effect) for ambient transformations. Void Chasm uses extreme parameters (100 ms window, 18 iterations, 8× effect strength) for dramatic spectral melting. Edge Freeze preserves transients (high ridge sensitivity). Enable stereo for immersive wide-field textures.

Important: This script is computationally intensive. Large spectrograms (high frequency resolution, many frames) with many iterations may take several minutes. Use speed modes (Balanced/Fast) for shorter processing. The diffusion algorithm treats time and frequency independently — dt_t and dt_f control how much information flows across each dimension. High dt values (near 0.24) can cause instability; the script clamps to 0.24. Effect_strength > 1 exaggerates the diffused shape, making subtle changes audible.

8 Presets

Preset	Window (ms)	Iter	dt_t/dt_f	κ (sens/edge)	Effect	Character
Shimmer Haze	40	8	0.20/0.05	2.08	2.5	Gentle diffusion, shimmering ambient haze.
Deep Terrain	60	12	0.20/0.18	1.50	4.0	Aggressive smoothing, cavernous depth.
Edge Freeze	30	5	0.10/0.08	1.75	2.0	Preserves transients, freezes edges.
Fog of War	50	10	0.12/0.22	1.67	4.0	Strong frequency smearing, foggy texture.
Formant Cloud	35	7	0.18/0.06	1.33	3.0	Preserves formants, clouds harmonics.
Transient Glass	25	6	0.22/0.04	1.60	3.5	Glass-like, transient-heavy diffusion.
Void Chasm	100	18	0.24/0.20	1.67	8.0	Extreme melting, spectral abyss.
See script for full parameter details.

Diffusion Theory: Perona-Malik / Beltrami-Inspired

Anisotropic diffusion PDE

∂u/∂t = div( g(|∇u|) · ∇u )

where u(t, x, y) is the log-energy spectrogram, g(·) is the edge-stopping function, and ∇u is the gradient magnitude.

Flux function (Perona-Malik): g(s) = exp(-(s/κ)²)

κ = ridge_sensitivity / edge_preservation (higher κ = more diffusion, lower κ = stronger edge preservation)

Discretised update per iteration (time dimension):
u ← u + dt_t·[flux_right·(u_right - u) + flux_left·(u_left - u)]
where flux_right = exp(-((grad_mag_mid + grad_mag_right)/2 / κ)²)
Frequency dimension: same formula with dt_f and vertical gradients. Combined gradient magnitude:
|∇u|² = (∂u/∂t)² + (∂u/∂f)²
computed via central differences in time and frequency, then sqrt().

Beltrami-inspired connection: Full Beltrami flow minimises the area of a manifold embedded in (time, frequency, energy) space, producing a "polyharmonic" diffusion that preserves edges and corners. This implementation uses the Perona-Malik flux (a simplification) but with separate time/frequency diffusion coefficients and gradient magnitude from both dimensions — hence "Beltrami-inspired". The result is spectral smoothing that respects both temporal and spectral structures.

Parameters

Analysis parameters

Parameter	Range	Description
Window_size_ms	10–200	FFT window length. Larger = better frequency resolution, worse time resolution.
Time_step_ms	2–50	Hop size between analysis frames. Smaller = smoother diffusion, more frames.
Max_frequency_Hz	100–Nyquist	Upper frequency limit. Lower = faster, focuses diffusion on low/mid frequencies.
Freq_resolution_Hz	20–500	Frequency bin width. Smaller = higher spectral resolution, more bins.
Dynamic_floor_dB	-100–-40	Noise floor clamping in log-energy matrix.

Diffusion parameters

Parameter	Range	Description
Iterations	1–50	Number of diffusion passes. Higher = more melting.
Time_diffusion (dt_t)	0.01–0.24	Diffusion rate across time frames. Higher = more temporal smearing.
Freq_diffusion (dt_f)	0.01–0.24	Diffusion rate across frequency bins. Higher = spectral blurring.
Ridge_sensitivity	0.5–5.0	Scales the gradient threshold. Higher = more diffusion (edges ignored less).
Edge_preservation	0.5–3.0	Denominator in κ formula. Higher = stronger edge preservation.

Effect & output

Parameter	Range	Description
Effect_strength	0–10	Exaggerates diffused shape: new = mean + (diffused - mean) × strength.
Wet/dry mix	0–1	Blend of processed (wet) and original (dry). 1 = pure wet.
Create_stereo	yes/no	Generates independent random phases per channel (Paulstretch pattern).
Stereo_phase_offset	0–1	Right channel phase range multiplier (phaseScale = 1 + offset).
Speed_mode	Full/Balanced/Fast	Resamples to 44.1k (full), 22k, or 11k before analysis — dramatically speeds up processing.

Applications

Ambient / Drone textures (Shimmer Haze, Deep Terrain)

Use case: Transform acoustic instruments or field recordings into smooth, evolving ambient textures.

Settings: Long window (60 ms), many iterations (12), moderate effect strength (2–4). The diffusion smears partials into sustained clouds.

Spectral edge preservation / freeze (Edge Freeze)

Use case: Keep transients and attacks sharp while melting the sustained portions.

Settings: High ridge_sensitivity (3.5), low dt_t/dt_f (0.10/0.08), few iterations (5). Transients remain intact; harmonics diffuse.

Formant / vowel transformation (Formant Cloud)

Use case: Alter vocal formants while preserving intelligibility.

Settings: Moderate frequency diffusion (0.06), low time diffusion (0.18), effect_strength = 3.0. Formant peaks are preserved (edge protection) but surrounding harmonics are smeared.

Extreme spectral melting (Void Chasm)

Use case: Experimental sound design — dissolve sounds into unrecognisable textures.

Settings: 100 ms window (very long), 18 iterations, dt_t=0.24/dt_f=0.20 (max diffusion), effect_strength=8.0. The result is a spectral abyss — tonal content becomes inharmonic noise-like.

Workflow: Piano → Shimmer Haze

Source: Solo piano recording.
Settings: Shimmer Haze preset, stereo enabled (offset=0.3).
Result: Piano notes begin with attack clarity, then melt into shimmering, diffuse haze. Stereo phase offset creates wide, enveloping texture.

Workflow: Voice → Void Chasm → Resample to very low rate

Source: Spoken word.
Settings: Void Chasm preset, then in Praat, resample output to 8 kHz.
Result: The already melted voice becomes almost unrecognisable — whispered, spectral ashes.

Workflow: Drum loop → Edge Freeze

Source: Drum loop (kick, snare, hi-hat).
Settings: Edge Freeze preset.
Result: Attacks (transients) preserved; sustain and release portions are smoothed. The rhythm remains intelligible, but the timbre becomes glassy/ethereal.

Troubleshooting:
• Processing is very slow: Use Balanced or Fast speed modes. Reduce frequency resolution (larger bins) or increase time step (fewer frames).
• Output is silent or extremely quiet: Check that Max_frequency_Hz is within Nyquist (speed modes reduce sample rate). Void Chasm with fast mode may clamp due to Nyquist.
• Diffusion not audible: Increase Effect_strength (3–8) to exaggerate the diffused terrain. Also increase iterations and dt values.
• Clicks at end of file: The script applies micro-fades and OLA normalisation. If clicks persist, increase micro-fade duration (edit script).
• Stereo phase difference too extreme: Reduce stereo_phase_offset (0.1–0.3). Paulstretch pattern uses random phases — each run will differ.

Visualisation (Suite 8×8)

When Draw_visualization is enabled, the script generates a Praat picture with:

Title bar — script name, preset, iterations, kappa, dt values, effect strength, wet mix.
Input waveform (grey) — original sound.
Output waveform (blue/orange for stereo) — Beltrami-melted result.
Original spectrogram (left) — before diffusion.
Output spectrogram (right) — after diffusion. Look for smearing across time/frequency.
Summary panel — all parameters, frames/bins, render time, speed mode.

Compare the two spectrograms to see the "melting" effect: ridges become valleys, sustained regions become smooth.