SSM Morph Composer — v1.0 User Guide

Structure‑driven audio recomposition via Self‑Similarity Matrix transformation. Segment a sound, build an SSM from spectral features, transform the matrix (blur, sharpen, diffuse, amplify motifs, warp structure), and navigate a new path through the modified similarity landscape. Praat reassembles events with crossfade.

Author: Shai Cohen Affiliation: Department of Music, Bar‑Ilan University, Israel Version: 1.0 (2026) License: MIT License Repo: https://github.com/ShaiCohen-ops/Praat-plugin_AudioTools

Contents:

What it does Quick start Pipeline (5 stages) SSM transformation modes Presets Parameters Path navigation Visualization

What this does

SSM Morph Composer extracts the structural blueprint of a sound and lets you reshape it. The sound is segmented into short events. For each event we compute a 5‑dimensional feature vector (spectral centroid, flatness, entropy, flux, RMS). The pairwise similarity between events forms a Self‑Similarity Matrix (SSM) – a map of repetitions and contrasts. The engine then applies one of five geometric transformations to the SSM, and finally a path walker moves through the transformed similarity space, generating a new sequence of events that is reassembled into audio.

Why SSM recomposition? Traditional concatenative synthesis uses surface features. SSM recomposition works on the relational structure of the sound. By transforming the similarity matrix you can: smooth structural boundaries → ambient washes; amplify diagonal bands → hypnotic loops; warp coordinates → folded time / non‑linear narratives. The result is a genuine morph of the original form.

Five transformation modes (the “SSM engines”):

Blur — Gaussian smoothing, reduces sharp contrasts → ambient, seamless texture
Sharpen — power mapping (SSM^γ), exaggerates motifs → repetitive / maximalist
Diffusion — iterative matrix diffusion, spreads similarity → evolving families of events
MotifAmplify — boosts diagonal energy bands → loop‑like / pattern reinforcement
StructureWarp — smooth nonlinear coordinate warp → folded time, phrase stretching

Key innovation v1.0: the path navigator includes teleport probability and visit penalty – parameters that turn the walk into a structural dramaturgy engine: teleport jumps to low‑visited events (section changes), visit penalty discourages cliques, creating longer‑range form. All in pure Python + Praat, no external ML.

Quick start

In Praat, select exactly one Sound object (mono or stereo).
Run script… → SSMComposer.praat.
Choose a Preset (Ambient blur, Motif mirror, Diffuse field, … Spectral labyrinth).
Adjust the SSM_mode (if custom) and navigation parameters (temperature, tabu, teleport, visit penalty).
Set segmentation thresholds (silence threshold, min intervals).
Enable Draw_SSM to see original and modified matrices (requires matplotlib for PNG export).
Click OK – Python segmentation, feature extraction, SSM building, transformation, path walking, and Praat reconstruction run.

Quick tip: Start with “Ambient blur” (Blur mode) to hear a smooth, washed‑out version. For percussive / rhythmic material, “Motif mirror” (Sharpen) reinforces loops. The “Folded time” preset (StructureWarp) creates strange temporal distortions. Enable Draw_SSM to visualise how the matrix changes – the path is overlaid in the main plot.

Important: Python dependencies: numpy, scipy, soundfile. matplotlib is required only for Draw_SSM PNG export. The segmentation uses Praat’s “To TextGrid (silences)” – adjust thresholds if your material is very quiet. Reconstructed duration may differ from original (because events are concatenated with crossfade). Very long outputs (Output_events up to 10000) are possible but may be slow.

Pipeline — five stages

Stage 1 – Segmentation (silence‑based, or fallback 0.25 s grid) → list of events with start/end.
Stage 2 – Feature extraction (from each mono patch): 5 features – centroid, flatness, entropy, flux, RMS.
Stage 3 – Build SSM (cosine or euclidean similarity, normalised 0…1).
Stage 4 – Transform SSM according to chosen mode (Blur/Sharpen/Diffusion/MotifAmplify/StructureWarp).
Stage 5 – Navigate path through modified SSM (temperature, tabu, teleport, visit penalty) → new event sequence.
Stage 6 – Praat reconstruction: extract each event from original, crossfade, concatenate, normalise.

Feature details

Feature	Description
Spectral centroid	Mean frequency (normalised by Nyquist)
Spectral flatness	Wiener entropy (tonal/noisy)
Spectral entropy	Normalised Shannon entropy of spectrum
Spectral flux	Mean absolute frame‑to‑frame difference
RMS energy	Root‑mean‑square amplitude

All features are robust‑normalised (median + IQR) before SSM computation.

SSM transformation modes

🌀 Blur

SSM_mod = gaussian_filter(SSM, sigma=2.5)
Smooths the similarity matrix, reducing sharp boundaries. Creates ambient, seamless transitions. Path walks become less sensitive to local contrasts.

⚡ Sharpen

SSM_mod = SSM^gamma (gamma=3.0)
Exaggerates high similarities, suppresses low ones. Reinforces motifs and repetitive patterns. The path becomes more attracted to diagonal bands.

🌊 Diffusion

SSM_{t+1} = alpha·SSM_t + (1‑alpha)·(SSM_t @ SSM_t)
Spreads similarity along paths of structural connection. Events that are indirectly similar become more alike. Creates evolving families of events.

🔁 MotifAmplify

Detects diagonal energy bands (repeated patterns) and adds boost:
SSM += boost * diag_energy where diag_energy accumulates off‑diagonal values. Perfect for turning any sound into a loop‑based texture.

📐 StructureWarp

Smooth nonlinear coordinate warp: SSM_warp(i,j) = SSM(i+f(i), j+f(j)), where f is a superposition of low‑frequency sines. Distorts phrase positions, folds time, creates non‑linear narrative arcs.

Built‑in presets (8)

Preset	Mode	Temp	Tabu	Teleport	Visit λ	Description
Ambient blur	Blur	0.8	5	0.03	0.4	Heavy smoothing, long crossfade (30 ms)
Motif mirror	Sharpen	0.1	3	0.01	0.1	Low temperature, tight tabu → repetitive
Diffuse field	Diffusion	0.5	8	0.02	0.3	Medium temperature, evolving families
Loop engine	MotifAmplify	0.05	20	0.005	0.5	Near‑deterministic, strong diagonal boost
Folded time	StructureWarp	0.4	12	0.02	0.3	Euclidean metric, warp amplitude 0.15
Frozen texture	Blur	0.02	2	0.005	0.2	Near‑frozen path, very long crossfade
Spectral labyrinth	Diffusion	0.9	15	0.05	0.6	Max stochastic, euclidean, preserve durations off

Crossfade_ms, preserve_event_durations and segmentation thresholds vary per preset – see Praat form for full values.

Parameters & defaults

Core parameters

Parameter	Default	Description
Output_events	300	Length of new event sequence (path steps)
Similarity_metric	cosine	cosine or euclidean
Temperature	0.3	0 = greedy (most similar); 1 = fully stochastic
Tabu_length	10	Number of recent events forbidden in next step
Teleport_probability	0.02	Chance to jump to a low‑visited event (section change)
Visit_penalty	0.3	Down‑weights over‑visited events: 0=off, higher = more exploration
Seed	1234	Random seed for reproducibility

Segmentation

Parameter	Default	Description
Silence_threshold_dB	-25 dB	Below this = silence
Min_silent_interval_s	0.08 s	Minimum silent gap to split
Min_sounding_interval_s	0.03 s	Minimum event duration

Reconstruction

Parameter	Default	Description
Crossfade_ms	10 ms	Fade‑in/out between concatenated events
Preserve_event_durations	yes	Use original duration (yes) or stretch to mean? (no)

Output

Flag	Default	Effect
Draw_SSM	0	Export original/modified SSM as PNG (needs matplotlib)
Draw_visualization	1	Praat picture: waveforms, spectrograms, event path, stats
Play_result	1	Auto‑play after processing

Path navigation — how the walk works

📐 Transition probability from event i to j

Raw weight = SSM_mod[i, j] (similarity after transformation).
Tabu: set weight = 0 for events in tabu list (recently visited).
Visit penalty: weight' = weight / (1 + λ·count[j])
Teleport: with probability p_tele choose j proportional to 1/(count+1) (low‑visited).
Temperature scaling: p(j) ∝ weight'^(1/T) (power‑law, not softmax).
T = 0 → greedy; T → ∞ → uniform over allowed.

This creates a controlled random walk that respects the transformed structure, avoids short cliques (tabu), and eventually covers the whole space (visit penalty / teleport).

Metrics written to stats.txt

path_entropy – uncertainty of event distribution (high = diverse)
diag_energy – average off‑diagonal similarity in original SSM (motif strength)
unique_events / repetition_rate – how many distinct events used
output_duration – estimated (sum of event durations, may differ from original)

Visualization (Praat picture)

When Draw_visualization = 1, the script draws a multi‑panel plot containing:

Original + modified SSM (if Draw_SSM enabled, else only if events ≤ 200)
Original waveform with event boundaries (red ticks)
Output waveform (reconstructed)
Original and output spectrograms (0–5 kHz)
Event path – step vs event index, showing navigation
Summary panel with mode, temperature, unique events, entropy, diag energy, durations

Note: If Draw_SSM is 1, two PNG files are also saved in the plugin directory: temp_ssm_original.png and temp_ssm_modified.png. You can open them externally.

Advanced use — custom transformations

The Python engine (ssm_morph_engine.py) can be called independently with your own event CSV and patches. The five transformation functions are modular; you can replace them or add new modes. The path navigator accepts --teleport_prob and --visit_lambda from command line.

Troubleshooting:

No events / too few events: lower silence threshold or reduce min sounding interval; fallback grid activates if nEvents < 4.
Python not found: install numpy, scipy, soundfile; ensure they are importable.
Draw_SSM but no PNGs: matplotlib missing – install it or set Draw_SSM=0.
Output silent / clicks: crossfade too short? increase crossfade_ms. Some events may be extremely short – check min_sounding_interval.
Path gets stuck in one event: increase tabu_length, teleport probability, or temperature.

Algorithmic detail: feature extraction

For each event patch: compute STFT (Hann window, 1024 FFT, hop 512).
• centroid = mean(∑ f·M_f / ∑ M_f) over frames, normalised by Nyquist.
• flatness = exp(mean(log(power))) / mean(power).
• entropy = –∑ p·log(p) / log(n_bins).
• flux = mean(abs(diff(magnitude))) across frames.
• RMS = √(mean(signal²)).
Features are then robust‑normalised (median, IQR).