Partial Editing & Resynthesis — User Guide

A sinusoidal modeling tool that deconstructs audio into pure sine waves (partials), allowing for independent pitch shifting, formant manipulation, and textural jittering.

Author: Shai Cohen Affiliation: Department of Music, Bar-Ilan University, Israel Version: 0.2 (2025) License: MIT License Repo: https://github.com/ShaiCohen-ops/Praat-plugin_AudioTools

Contents:

What this does Quick start Theory: Sinusoidal Modeling Parameters & Presets Algorithm Deep Dive Applications

What this does

This script performs Analysis-by-Synthesis based on the McAulay-Quatieri (MQ) paradigm. Instead of treating audio as a waveform or a static spectrum, it views sound as a collection of individual sine waves (partials) that evolve over time.

It analyzes the sound in small windows, finds the most prominent frequencies (peaks), and discards the rest (noise). It then reconstructs the sound using a bank of oscillators. This allows you to:

Purify Audio: Remove breathiness or noise, leaving only the tonal content.
Shift Pitch/Formants: Change the pitch without changing the speed, or change the vocal tract size (chipmunk/giant) without changing the pitch.
Texturize: Add random "jitter" to the frequencies and amplitudes to create organic, shimmering, or glassy textures.

Relation to SPEAR: If you have used the software SPEAR (Sinusoidal Partial Editing Analysis and Resynthesis), this script performs a similar function frame-by-frame within Praat, automating the extraction and transformation process.

Quick start

Select a Sound object in the Praat Objects window.
Run script… → Partial Editing & Resynthesis.praat.
Choose a Preset:
- Clean Resynth: Faithful reconstruction (removes noise).
- Formant Shift Down: Turns voices into "giants."
- Glassy Shimmer: Adds high jitter for a texturizing effect.
Click OK.

Performance Warning: This script uses a "brute force" method to synthesize thousands of sine waves. A 10-second file may take 30-60 seconds to process depending on the max_partials_per_frame setting.

Theory: Sinusoidal Modeling

Fourier's Theorem

Any periodic waveform can be represented as a sum of sine waves at different frequencies, amplitudes, and phases.

[Image of Fourier Series decomposition]

The Process

Windowing: The sound is sliced into overlapping frames (e.g., every 15ms).
FFT Analysis: Each frame is converted to a Spectrum.
Peak Picking: The script scans the spectrum and picks the loudest $N$ peaks (partials).
Transformation: The frequency and amplitude of these peaks can be mathematically altered (shifted, scaled, jittered).
Oscillator Bank: The script generates fresh sine waves for these new values.
Overlap-Add: The generated sine waves are windowed (Hanning) and layered on top of each other to rebuild the continuous sound.

Parameters & Presets

Analysis Parameters

Parameter	Default	Description
window_length	0.060s	Size of the analysis chunk. Longer = better frequency resolution (bass), worse time resolution (transients).
hop_size	0.015s	How often to analyze. Smaller = smoother changes but slower processing.
max_partials...	10	Density. How many sine waves to generate per frame. 5 = hollow/sparse. 20 = rich/full.
min/max_frequency	60-8000	Range of detection. Frequencies outside this range are ignored.

Transformation Parameters

Parameter	Description
transpose_semitones	Pitch shift. +12 = Octave Up. -12 = Octave Down.
formant_shift_ratio	Timbre shift. 1.0 = Normal. 1.2 = Smaller vocal tract (Chipmunk/Child). 0.8 = Larger vocal tract (Giant).
amplitude_scale	Global volume control (multiplier).

Jitter (Texture) Parameters

Jitter adds random deviation to every partial in every frame. This breaks the "perfect" digital silence and creates "organic" or "shimmering" textures.

freq_jitter_range: (Hz) Adds random $\pm Hz$ to every partial. Creates detuning/chorus effects.
amp_jitter_range: (dB) Adds random volume fluctuation. Creates tremolo/roughness.

Algorithm Deep Dive

1. Peak Picking & Suppression

How does the script find the "important" parts of the sound?

Loop k from 1 to max_partials: 1. Find absolute maximum in Spectrum. 2. Record Frequency and Amplitude. 3. "Suppress" this peak in the Spectrum matrix: Set amplitude to 0 for the peak bin AND ±2 neighbor bins. 4. Repeat.

The suppress_bins parameter is crucial here. It prevents the script from picking the "shoulders" of a strong peak as separate partials, ensuring it moves on to different harmonic frequencies.

2. Resynthesis Formula

For every extracted peak, a sine wave is generated using the formula:

Grain(t) = Amp * sin(2 * π * Freq * t) * HanningWindow(t) Where: Freq = (OriginalFreq * 2^(transpose/12)) * FormantRatio ± Random(Jitter) Amp = OriginalAmp * Scale ± Random(Jitter)

🧮 Formants vs. Pitch

Pitch Shifting: Multiplies all frequencies by a constant factor. The harmonic relationships remain identical.

Formant Shifting: In this script, it is implemented similarly to pitch shifting but intended to be used against transposition or on its own to simulate physical size changes.

Applications

1. De-Noising / "Spectral Cleaning"

Preset: Clean Resynth / Sparse Partials

Because the script only resynthesizes the loud peaks (harmonics) and ignores the low-level valleys (noise floor), the output is often a "cleaner," albeit more synthetic, version of the original. Great for isolating melody from noisy backgrounds.

2. "Ghost" or "Whisper" Textures

Preset: Whisper Ghost

By using a low number of partials (sparse) and high jitter, the sound loses its tonal center and becomes a diffuse cloud of sine waves. This is excellent for horror sound design or ambient pads.

3. Robotic / Sci-Fi Voices

Preset: Robotic

Setting jitter to 0 and max_partials to a moderate number (10) creates a perfect, phase-incoherent reconstruction. The lack of phase alignment between frames gives speech a distinct "metallic" or "vocoded" character without using a carrier signal.

Limitations

Phase Coherence: The script generates new sine waves every frame (15ms). It does not "track" partials across frames. This means the phase is reset every frame, leading to a "smearing" of transients. Drum beats will lose their punch; sustained tones work best.
Processing Time: It is computationally expensive. It is an offline process, not real-time.