Beltrami-Inspired Spectral Melter — Anisotropic Diffusion

Treats the spectrogram as a 3D acoustic terrain and applies anisotropic diffusion (Perona-Malik / Beltrami-inspired) that flows freely across smooth spectral plains but is blocked at ridges — attacks, formant edges, note boundaries. The diffused matrix encodes a reshaped spectral terrain.

Author: Shai Cohen Affiliation: Department of Music, Bar-Ilan University, Israel Version: 2.2 (2026) License: MIT License Repo: https://github.com/ShaiCohen-ops/Praat-plugin_AudioTools
Contents:

What this does

This script implements anisotropic spectral diffusion inspired by Beltrami flow and Perona-Malik image denoising. The spectrogram is interpreted as a 3D terrain: X = time frame, Y = frequency bin, Z = log energy. Anisotropic diffusion smooths the terrain along smooth plains (homogeneous regions) while preserving sharp edges (spectral ridges, transients, formants). The diffused matrix is then resynthesised via overlap-add FFT, with optional stereo phase randomisation (Paulstretch-style) and exaggerated spectral shaping to make the effect perceptually dramatic.

What is anisotropic diffusion? Unlike isotropic (uniform) blur, anisotropic diffusion uses a variable diffusion coefficient that depends on the local gradient magnitude. High gradients (edges) reduce diffusion, preserving boundaries. Low gradients (flat regions) allow diffusion, smoothing out noise and detail. In this script, diffusion flows across time (frames) and frequency (bins) independently, with edge sensitivity controlled by ridge_sensitivity and edge_preservation. The result is a "melted" spectrogram where sustained tones smear into ambient textures while percussive attacks retain their shape.

Key Features:

Performance & v2.2 fixes: This script fixes three critical issues: (1) correct kappa formula (ridge_sensitivity/edge_preservation), (2) Nyquist clamping to prevent silent mis-binning in speed modes, (3) authoritative nBins from matrix dimensions, (4) OLA normalisation buffer to cancel Hann² windowing coloration. The algorithm is computationally heavy (iterative diffusion across matrices) but fully vectorised using Praat's native Formula.

Quick start

  1. In Praat, select exactly one Sound object.
  2. Run script…Beltrami_Inspired_Spectral_Melter.praat.
  3. Choose a preset from the dropdown (Shimmer Haze, Deep Terrain, Edge Freeze, Fog of War, Formant Cloud, Transient Glass, Void Chasm, or Custom).
  4. Adjust analysis parameters (window, time step, frequency resolution) and diffusion parameters (iterations, time/freq diffusion rates, ridge sensitivity, edge preservation).
  5. Set Effect_strength — higher values exaggerate the diffused terrain (spectral peaks soar, valleys drop).
  6. Enable Stereo and set Stereo_phase_offset for Paulstretch-style stereo decorrelation.
  7. Select Speed mode (Full quality / Balanced / Fast).
  8. Click OK — script builds spectrogram, runs diffusion, resynthesises, outputs originalname_BeltramiInspired_presetname.
Quick tip: Start with Shimmer Haze (gentle diffusion, moderate effect) for ambient transformations. Void Chasm uses extreme parameters (100 ms window, 18 iterations, 8× effect strength) for dramatic spectral melting. Edge Freeze preserves transients (high ridge sensitivity). Enable stereo for immersive wide-field textures.
Important: This script is computationally intensive. Large spectrograms (high frequency resolution, many frames) with many iterations may take several minutes. Use speed modes (Balanced/Fast) for shorter processing. The diffusion algorithm treats time and frequency independently — dt_t and dt_f control how much information flows across each dimension. High dt values (near 0.24) can cause instability; the script clamps to 0.24. Effect_strength > 1 exaggerates the diffused shape, making subtle changes audible.

8 Presets

PresetWindow (ms)Iterdt_t/dt_fκ (sens/edge)EffectCharacter
Shimmer Haze4080.20/0.052.082.5Gentle diffusion, shimmering ambient haze.
Deep Terrain60120.20/0.181.504.0Aggressive smoothing, cavernous depth.
Edge Freeze3050.10/0.081.752.0Preserves transients, freezes edges.
Fog of War50100.12/0.221.674.0Strong frequency smearing, foggy texture.
Formant Cloud3570.18/0.061.333.0Preserves formants, clouds harmonics.
Transient Glass2560.22/0.041.603.5Glass-like, transient-heavy diffusion.
Void Chasm100180.24/0.201.678.0Extreme melting, spectral abyss.
See script for full parameter details.

Diffusion Theory: Perona-Malik / Beltrami-Inspired

Anisotropic diffusion PDE

∂u/∂t = div( g(|∇u|) · ∇u )

where u(t, x, y) is the log-energy spectrogram, g(·) is the edge-stopping function, and ∇u is the gradient magnitude.

Flux function (Perona-Malik): g(s) = exp(-(s/κ)²)

κ = ridge_sensitivity / edge_preservation (higher κ = more diffusion, lower κ = stronger edge preservation)

Discretised update per iteration (time dimension):
u ← u + dt_t·[flux_right·(u_right - u) + flux_left·(u_left - u)]
where flux_right = exp(-((grad_mag_mid + grad_mag_right)/2 / κ)²)
Frequency dimension: same formula with dt_f and vertical gradients. Combined gradient magnitude:
|∇u|² = (∂u/∂t)² + (∂u/∂f)²
computed via central differences in time and frequency, then sqrt().
Beltrami-inspired connection: Full Beltrami flow minimises the area of a manifold embedded in (time, frequency, energy) space, producing a "polyharmonic" diffusion that preserves edges and corners. This implementation uses the Perona-Malik flux (a simplification) but with separate time/frequency diffusion coefficients and gradient magnitude from both dimensions — hence "Beltrami-inspired". The result is spectral smoothing that respects both temporal and spectral structures.

Parameters

Analysis parameters

ParameterRangeDescription
Window_size_ms10–200FFT window length. Larger = better frequency resolution, worse time resolution.
Time_step_ms2–50Hop size between analysis frames. Smaller = smoother diffusion, more frames.
Max_frequency_Hz100–NyquistUpper frequency limit. Lower = faster, focuses diffusion on low/mid frequencies.
Freq_resolution_Hz20–500Frequency bin width. Smaller = higher spectral resolution, more bins.
Dynamic_floor_dB-100–-40Noise floor clamping in log-energy matrix.

Diffusion parameters

ParameterRangeDescription
Iterations1–50Number of diffusion passes. Higher = more melting.
Time_diffusion (dt_t)0.01–0.24Diffusion rate across time frames. Higher = more temporal smearing.
Freq_diffusion (dt_f)0.01–0.24Diffusion rate across frequency bins. Higher = spectral blurring.
Ridge_sensitivity0.5–5.0Scales the gradient threshold. Higher = more diffusion (edges ignored less).
Edge_preservation0.5–3.0Denominator in κ formula. Higher = stronger edge preservation.

Effect & output

ParameterRangeDescription
Effect_strength0–10Exaggerates diffused shape: new = mean + (diffused - mean) × strength.
Wet/dry mix0–1Blend of processed (wet) and original (dry). 1 = pure wet.
Create_stereoyes/noGenerates independent random phases per channel (Paulstretch pattern).
Stereo_phase_offset0–1Right channel phase range multiplier (phaseScale = 1 + offset).
Speed_modeFull/Balanced/FastResamples to 44.1k (full), 22k, or 11k before analysis — dramatically speeds up processing.

Applications

Ambient / Drone textures (Shimmer Haze, Deep Terrain)

Use case: Transform acoustic instruments or field recordings into smooth, evolving ambient textures.

Settings: Long window (60 ms), many iterations (12), moderate effect strength (2–4). The diffusion smears partials into sustained clouds.

Spectral edge preservation / freeze (Edge Freeze)

Use case: Keep transients and attacks sharp while melting the sustained portions.

Settings: High ridge_sensitivity (3.5), low dt_t/dt_f (0.10/0.08), few iterations (5). Transients remain intact; harmonics diffuse.

Formant / vowel transformation (Formant Cloud)

Use case: Alter vocal formants while preserving intelligibility.

Settings: Moderate frequency diffusion (0.06), low time diffusion (0.18), effect_strength = 3.0. Formant peaks are preserved (edge protection) but surrounding harmonics are smeared.

Extreme spectral melting (Void Chasm)

Use case: Experimental sound design — dissolve sounds into unrecognisable textures.

Settings: 100 ms window (very long), 18 iterations, dt_t=0.24/dt_f=0.20 (max diffusion), effect_strength=8.0. The result is a spectral abyss — tonal content becomes inharmonic noise-like.

Workflow: Piano → Shimmer Haze

Source: Solo piano recording.
Settings: Shimmer Haze preset, stereo enabled (offset=0.3).
Result: Piano notes begin with attack clarity, then melt into shimmering, diffuse haze. Stereo phase offset creates wide, enveloping texture.

Workflow: Voice → Void Chasm → Resample to very low rate

Source: Spoken word.
Settings: Void Chasm preset, then in Praat, resample output to 8 kHz.
Result: The already melted voice becomes almost unrecognisable — whispered, spectral ashes.

Workflow: Drum loop → Edge Freeze

Source: Drum loop (kick, snare, hi-hat).
Settings: Edge Freeze preset.
Result: Attacks (transients) preserved; sustain and release portions are smoothed. The rhythm remains intelligible, but the timbre becomes glassy/ethereal.

Troubleshooting:
Processing is very slow: Use Balanced or Fast speed modes. Reduce frequency resolution (larger bins) or increase time step (fewer frames).
Output is silent or extremely quiet: Check that Max_frequency_Hz is within Nyquist (speed modes reduce sample rate). Void Chasm with fast mode may clamp due to Nyquist.
Diffusion not audible: Increase Effect_strength (3–8) to exaggerate the diffused terrain. Also increase iterations and dt values.
Clicks at end of file: The script applies micro-fades and OLA normalisation. If clicks persist, increase micro-fade duration (edit script).
Stereo phase difference too extreme: Reduce stereo_phase_offset (0.1–0.3). Paulstretch pattern uses random phases — each run will differ.

Visualisation (Suite 8×8)

When Draw_visualization is enabled, the script generates a Praat picture with:
  • Title bar — script name, preset, iterations, kappa, dt values, effect strength, wet mix.
  • Input waveform (grey) — original sound.
  • Output waveform (blue/orange for stereo) — Beltrami-melted result.
  • Original spectrogram (left) — before diffusion.
  • Output spectrogram (right) — after diffusion. Look for smearing across time/frequency.
  • Summary panel — all parameters, frames/bins, render time, speed mode.
Compare the two spectrograms to see the "melting" effect: ridges become valleys, sustained regions become smooth.