Envelope Editor — v1.0 User Guide

Interactive multi‑lane breakpoint editor for applying time‑varying pan, pitch, intensity, and formant envelopes to a selected sound.

Author: Shai Cohen Affiliation: Department of Music, Bar‑Ilan University, Israel Version: 1.0 (2025) License: MIT License Repo: GitHub

Contents:

What it does Quick start The four lanes GUI & breakpoint editing DSP processing order Parameters & defaults FAQ / troubleshooting

What this does

Envelope Editor provides a four‑lane breakpoint editor where you can draw time‑varying curves for:

Pan (stereo position, -1 left … 0 centre … +1 right)
Pitch (semitone shift, -12 … 0 … +12 st)
Intensity (gain in dB, -24 … 0 … +24 dB)
Filter (formant shift / spectral tilt, 80 … 1000 … 16000 Hz cutoff)

The interface is scaled to the exact duration of the selected sound. You can add, drag, and delete breakpoints with simple mouse gestures. When you click Apply, the Python engine processes the audio in the following order:

Pitch shift (phase‑vocoder via STFT + resample_poly)
Filter (time‑varying shelf filter – dry/wet blend)
Intensity (per‑sample dB gain)
Pan (equal‑power stereo placement)

The result is imported back into Praat as a new Sound object.

Key feature: Each lane is completely independent. You can draw complex trajectories, and the DSP is designed to sound musical even with extreme values (e.g., +24 dB gain, or a rapid pan sweep). The filter lane uses a dry/wet blend so that small deviations from 1000 Hz are subtle, while large shifts (down to 80 Hz or up to 16 kHz) produce a noticeable colour.

Quick start

In Praat, select exactly one Sound object (mono or stereo).
Run script… → Envelope_Editor.praat.
The Python GUI opens with four lanes. Each lane shows a horizontal line at the default value (pan 0, pitch 0, intensity 0 dB, filter 1000 Hz).
To add a breakpoint: left‑click anywhere inside the plotting area.
To drag a breakpoint: left‑click and hold, then move the mouse. The first and last points are locked at the start and end of the sound (they can only be moved vertically).
To delete a breakpoint: right‑click on it (endpoints cannot be deleted).
After editing all four lanes, click Apply. Processing may take a few seconds (pitch shift is the most expensive).
When finished, the GUI closes and the processed sound appears in Praat as originalname_enved.

Tip: You can reset all lanes to their defaults by clicking Reset All. The status bar shows the number of breakpoints per lane after processing.

Important: Python dependencies: numpy, soundfile, scipy. Tkinter is included with Python. The pitch shift algorithm uses a segmented resample_poly method with energy matching. Extreme pitch shifts (±12 st) may introduce slight artefacts – this is expected for a real‑time‑capable vocoder. The filter lane is a gentle shelf filter, not a precise formant shifter; it is intended for spectral colour, not vocal tract morphing.

The four lanes

Lane	Range	Default	Interpretation
Pan	-1.0 … +1.0	0.0	Stereo position: -1 = hard left, 0 = centre, +1 = hard right. Equal‑power panning law.
Pitch	-12 … +12 st	0	Semitone shift. Positive = higher pitch, negative = lower. Duration is preserved (phase vocoder).
Intensity	-24 … +24 dB	0	Gain in decibels. +6 dB doubles amplitude, -6 dB halves it. Clipped to avoid saturation.
Filter	80 … 16000 Hz	1000 Hz	Cutoff frequency of a shelf filter. Below 1000 Hz = low‑pass shelf (darkens), above 1000 Hz = high‑pass shelf (brightens). Dry/wet blend proportional to distance from 1000 Hz.

The filter lane uses a logarithmic scale (tick marks at 80, 200, 500, 1k, 2k, 4k, 8k, 16k). The other lanes are linear.

GUI & breakpoint editing

Interface preview (dark theme)

┌─────────────────────────────────────────────────────────────┐
│ sound.wav 3.45s 44100Hz stereo Left‑click: add/drag Right‑click: delete │
├─────────────────────────────────────────────────────────────┤
│ Pan (-1…+1) ┌──────────────────────────────────────────┐
│ │ • • • │
│ │ • • • │
│ └──────────────────────────────────────────┘
│ Pitch (-12…+12st) ┌──────────────────────────────────────────┐
│ │ • │
│ │ • • • │
│ └──────────────────────────────────────────┘
│ Intensity (-24…+24dB) ┌──────────────────────────────────────────┐
│ │ • • │
│ │ • • • • │
│ └──────────────────────────────────────────┘
│ Filter (80…16k Hz) ┌──────────────────────────────────────────┐
│ │ • • │
│ │ • • • │
│ └──────────────────────────────────────────┘
├─────────────────────────────────────────────────────────────┤
│ Ready. [ Reset All ] [ Cancel ] [ ▶ Apply ] │
└─────────────────────────────────────────────────────────────┘

Editing controls

Left‑click in empty space – add a new breakpoint at that time and value.
Left‑click and drag an existing point – move it. The first and last points are anchored at the sound boundaries (time fixed, only value can change).
Right‑click on a point – delete it (except first/last).
Mouse wheel – scroll through the four lanes if they don’t all fit on screen.
Reset All – restore all lanes to a flat line at default values.
Cancel – close GUI without processing.
Apply – run DSP and create output.

The current value of a breakpoint is displayed above it. The background fill shows the area between the envelope and the default line, colour‑coded per lane.

DSP processing order

1. Pitch shift – per‑sample semitone envelope.
  Segmented resample_poly (seg_len 4096, 75% overlap) + energy matching.
  If envelope is near zero, this stage is skipped to save CPU.

2. Filter (spectral tilt) – per‑sample cutoff envelope.
  2nd‑order Butterworth shelf (low‑pass below 1000 Hz, high‑pass above).
  Dry/wet blend proportional to distance from neutral (1000 Hz).

3. Intensity – per‑sample dB envelope → linear gain.
  gain = 10^(dB/20)

4. Pan – per‑sample pan envelope.
  Equal‑power law: angle = ((pan+1)/2) * π/2,
  L = cos(angle), R = sin(angle).
  For stereo input, mid‑side decomposition is used to preserve width.

The processing is applied to the whole audio file. All envelopes are linearly interpolated between breakpoints. The output is normalised to a peak of 0.99 (to avoid clipping) and saved as a 32‑bit float WAV (Praat reads this reliably).

Parameters & defaults

All parameters are controlled through the GUI; there are no numeric entry fields. The lane ranges are fixed as described above.

Lane	Min	Max	Default
Pan	-1.0	+1.0	0.0
Pitch	-12 st	+12 st	0
Intensity	-24 dB	+24 dB	0
Filter	80 Hz	16000 Hz	1000 Hz

Internal constants (STFT size, overlap, etc.) are fixed in the Python script and not user‑adjustable.

FAQ / troubleshooting

“Python not found” or missing packages

Install: pip install numpy scipy soundfile. On Windows, the script uses py (Python launcher).

GUI opens but “Apply” does nothing / no output

Check the Info window in Praat – the Python engine prints detailed debug info there. Look for error messages. Common issues: SciPy not installed correctly, or the input file is very short (pitch shift needs at least 4096 samples).

Output is silent or extremely quiet

The script writes a debug log next to the output WAV (named output_debug.txt). Check the RMS and peak values. If RMS is near zero, the intensity envelope may have large negative values (–24 dB = gain 0.06, which is quiet but not silent). If the filter lane is at 80 Hz with high wet, it may attenuate most energy – reduce the wet amount or move the cutoff higher.

Pitch shift sounds glitchy / has artefacts

The algorithm uses segmented resample_poly, which is a reasonable compromise between quality and speed. Extreme shifts (±12 st) will produce artefacts; for cleaner results, keep shifts within ±6 st. The engine also skips processing if the envelope is near zero, so you can draw partial curves.

Filter lane seems to have no effect near 1000 Hz

That’s intentional – the filter uses a dry/wet blend proportional to distance from 1000 Hz. At exactly 1000 Hz, wet = 0 (pure dry). At 500 Hz, wet ≈ 0.55, so you get a 55/45 blend. This makes the filter musical and prevents abrupt timbre changes. For a stronger effect, move the cutoff further away.

Debug log

When processing fails or produces unexpected output, a debug file (output_debug.txt) is saved alongside the output WAV. It contains envelope statistics, intermediate RMS values, and any error tracebacks. This can be invaluable for diagnosing problems.

Processing speed

The pitch shift stage is the most CPU‑intensive. For a 10‑second file, processing takes about 1‑2 seconds on a modern CPU. If you only need intensity and pan, you can leave the pitch and filter lanes flat – the engine detects this and skips those stages.