Spectral Morph — User Guide

CDP‑style spectral morphing powered by a Python STFT engine. Interpolates between two sounds in the spectral domain using three distinct morph modes: log‑magnitude (preserve A phase), full complex (blend phase), and formant/envelope (cepstral envelope morph).

Author: Shai Cohen Affiliation: Department of Music, Bar‑Ilan University, Israel Version: 4.1 (2025) License: MIT License Repo: GitHub
Contents:

What this does

Spectral Morph performs a time‑varying spectral interpolation between two sounds. For each analysis frame, the short‑time Fourier transforms of sound A and sound B are computed. A morph factor m(t)(ranging from 0 to 1) determines the blend at that moment, according to a user‑selected curve (linear, cosine S‑curve, or fixed mix). The actual spectral interpolation is then carried out in one of three modes:

The heavy DSP (STFT, morphing, overlap‑add) is offloaded to a Python script (spectral_morph.py), while Praat handles the UI, file export, import, and a comprehensive 8‑panel visualisation.

Quick start

  1. In Praat, select exactly two Sound objects. The first selected will be A (source), the second B (target).
  2. Run script…Spectral_Morph.praat.
  3. Choose a Preset:
    • Tonal Sustained, Percussive, Voice / Formant Morph, Texture Blend, Fast Preview
  4. For custom mode (preset = Custom), adjust parameters as desired:
    • Start_morph_s / End_morph_s – time region over which the morph evolves (0 = full duration).
    • Curve_type – Linear, Cosine (smooth S‑curve), Full mix (fixed blend).
    • Mix_amount – fixed blend ratio (only for Full mix).
    • Window_ms – STFT window size in milliseconds.
    • Morph_mode – Log magnitude, Full complex, Formant/envelope.
  5. Click OK. Praat exports both sounds, calls the Python engine, and imports the result as A_morph_B.
Tip: Start with Tonal Sustained (Log magnitude, linear curve) for a clean crossfade between two sustained tones. For a more dramatic spectral evolution, try Voice / Formant Morph with a cosine curve.
Important: Python dependencies: numpy, soundfile. If the sample rates of A and B differ, scipy is required for resampling. The engine automatically matches durations by time‑aligning the longer sound.

The 5 presets (+ Custom)

PresetWindowMorph modeCurveDescription
Tonal Sustained60 msLog magnitudeCosineSmooth transition between sustained tones, pads.
Percussive25 msLog magnitudeLinearShort window for transient preservation.
Voice / Formant Morph50 msFormant/envelopeCosineMorphs the spectral envelope (formants) while keeping A's excitation.
Texture Blend80 msFull complexCosineBlends both magnitude and phase – creates rich, evolving textures.
Fast Preview120 msLog magnitudeLinearLarge window for quick preview (lower quality, faster).

Morph modes

1. Log magnitude (geometric interpolation)

mag_out = exp( (1‑m)·log(mag_A) + m·log(mag_B) )
phase_out = phase_A

This mode interpolates the magnitude spectra in the log domain, which corresponds to a smooth morph between the power spectra. The phase of sound A is preserved, ensuring that the temporal fine structure (e.g., attack transients) remains aligned with the source. This is often the most natural‑sounding morph for tonal materials.

2. Full complex (blend magnitude and phase)

mag_out = (1‑m)·mag_A + m·mag_B
phase_out = phase_A + m·Δφ where Δφ is the phase difference wrapped appropriately.

Both magnitude and phase are interpolated linearly. This can create more dramatic, “phasy” textures, as the phase relationships evolve continuously. Useful for sound design and complex textures.

3. Formant / envelope (cepstral envelope morph)

This mode implements the classic CDP‑style formant morph:

  1. Extract the spectral envelope of A and B via cepstral liftering (order 60).
  2. Separate the fine structure (excitation) of A: fine_A = mag_A / envelope_A.
  3. Interpolate the envelopes in the log domain.
  4. Reconstruct: mag_out = fine_A × envelope_out; phase from A.

Result: the formant structure of A gradually shifts toward that of B, while the excitation (pitch, noise) remains from A. Ideal for vocal morphs, instrument timbre shifts, etc.

Morph curves

The morph factor m(t) controls how much of B is present at time t. It varies from 0 (pure A) to 1 (pure B) according to one of three curves:

CurveFormula (inside morph region)Behaviour
Linearm(t) = (t‑t0)/(t1‑t0)Constant rate of change.
Cosine (S‑curve)m(t) = 0.5‑0.5·cos(π·(t‑t0)/(t1‑t0))Slow at start and end, fastest in the middle – natural, smooth transition.
Full mixm(t) = mix_amount (constant)Fixed blend throughout the region – no transition, just a crossfade.

Outside the morph region (t < start_morph_s), the output is pure A; after end_morph_s, it is pure B. The curve is drawn in the visualisation panel.

Parameters & defaults

Morph region

ParameterRangeDefaultDescription
Start_morph_s0 … end_morph_s0Time at which morph begins (0 = start of sound).
End_morph_s≥ start_morph_s0 (max duration)Time at which morph ends. 0 means the end of the longer sound.

Curve & mix

ParameterOptionsDefaultDescription
Curve_typeLinear / Cosine / Full mixCosineShape of the morph factor over time.
Mix_amount0–10.5Fixed blend ratio (only used for Full mix).

Analysis & mode

ParameterRangeDefaultDescription
Window_ms≥ 10 ms60 msSTFT analysis window length. Larger = better frequency resolution, worse time resolution.
Morph_modeLog mag / Full complex / FormantLog magWhich spectral components to interpolate.

Output

ParameterDefaultDescription
Draw_visualizationyesShow 8‑panel visualisation (waveforms, spectrograms, morph curve, summary).
Play_outputyesAuto‑play after processing.

Visualization (8‑panel Praat picture)

When Draw_visualization = 1, the script draws a comprehensive multi‑panel plot:

Tip: The morph curve panel is especially useful to verify that the transition happens exactly where you intended. The region is shaded light blue.

FAQ / troubleshooting

“Python not found” or missing packages

Install: pip install numpy soundfile. If sample rates differ, also pip install scipy. On Windows, the script uses py (Python launcher).

Output duration is longer than expected

The engine time‑aligns the two sounds to the longer of the two. If A is 3 s and B is 5 s, the output will be 5 s. The extra time after A’s end is pure B (since morph factor = 1).

Morph sounds “phasy” or has artefacts

The full‑complex mode (phase interpolation) can produce phase artefacts. Try the log‑magnitude mode, which preserves A’s phase and is generally cleaner. If you need phase morphing, reduce the window size to improve time resolution.

Formant / envelope mode details

The cepstral lifter order is fixed at 60. This is suitable for speech and most musical sounds. For extremely low‑pitched sounds, you may want to increase the order – edit the order=60 argument in spectral_envelope() inside the Python script.

Channel handling

If A and B have different numbers of channels, the engine duplicates the last channel to match the higher count. For example, if A is mono and B is stereo, A is treated as two identical channels. This ensures a consistent output channel count.