Gestural Accumulator — User Guide

Compositional canon generator: creates accumulating variants of a sound gesture, arranges them via acoustic distance scheduling to form evolving sonic narratives.

Author: Shai Cohen Affiliation: Department of Music, Bar-Ilan University, Israel Version: 10.0 (2025) License: MIT License Repo: https://github.com/ShaiCohen-ops/Praat-plugin_AudioTools

Contents:

What this does Quick start Conceptual Framework Presets & Rhetoric Parameters Processing Workflow Applications Troubleshooting

What this does

This script transforms a single sound gesture into a compositional canon — an accumulating sequence of transformed variants arranged according to acoustic similarity. It generates multiple pitch‑, time‑, and formant‑shifted versions of the input, analyzes their MFCC feature space, then sequences them using a "budget‑as‑schedule" algorithm that controls the pacing and overlap rhetoric of the resulting accumulation.

Key Features:

3 rhetorical presets – Smooth Drift, Violent Rupture, Nervous Energy
Acoustic distance‑based sequencing – Variants arranged by MFCC similarity
Budget‑as‑schedule algorithm – Controls pacing and accumulation rate
Stereo‑aware processing – Preserves stereo input throughout transformations
Physics‑safe transformations – Prevents crashes from extreme time‑stretching
Rhetorical overlap control – Hide or expose transitions between variants

What is a gestural canon? A canon in music is a compositional technique where a melody is imitated by one or more voices after a time delay. This script adapts the concept to sound gestures: the "original" is transformed into multiple "voices" (variants) that accumulate over time, creating a layered, evolving texture. The sequencing is not random but follows acoustic similarity, creating a narrative of transformation where each step feels both connected and different.

Technical Implementation: (1) Variant generation: Creates N variants via Change Gender (pitch, formant, time‑stretch). (2) Feature extraction: Computes 13‑dimensional MFCC mean vectors (plus variance if tracking motion). (3) Distance matrix: Calculates Euclidean distances between all variant pairs. (4) Budget scheduling: Uses target_budget to schedule accumulation rate (linear/accelerating/decelerating). (5) Greedy selection: Sequentially picks variants whose distance matches the schedule. (6) Assembly: Concatenates selected variants with rhetorical overlap (hide/expose transitions).

Quick start

In Praat, select exactly one Sound object (mono or stereo).
Run script… → Gestural_Accumulator.praat.
Choose a Preset or select "Custom" for full parameter control.
Adjust structural parameters if needed (N_variants, K_steps, Target_budget).
Click OK – the script will generate variants, analyze, and assemble.
The output appears as Sound originalname_v10_presetname.

Quick tip: Start with a short (1‑5 second), harmonically rich sound gesture (instrument note, vocal phrase, percussive hit). Use the Smooth Drift preset for gentle, evolving accumulation. For dramatic effect with clear transitions, try Violent Rupture. Monitor the Info window for progress updates and statistics.

Important: This script performs intensive processing – generating 30 variants with MFCC analysis can take 30‑60 seconds depending on source duration. Very long inputs (>10 s) will increase processing time proportionally. The script includes crash protection for extreme time‑stretching, but very short inputs (<0.1 s) may produce fewer usable variants. Stereo processing doubles the memory and time requirements.

Conceptual Framework

The Three‑Layer Model

1. VARIANT GENERATION (Bottom Layer)
• Creates N transformed versions of the source
• Transformations: pitch shift (±range_st), time‑stretch (±time_stretch), formant shift (±range)
• Each variant is a unique "voice" in the canon
2. FEATURE SPACE (Middle Layer)
• Extracts 13 MFCC coefficients per variant
• Optionally tracks MFCC variance (motion tracking)
• Computes Euclidean distance between all variant pairs
• Creates acoustic similarity map
3. NARRATIVE SCHEDULING (Top Layer)
• Uses target_budget as "distance budget"
• Schedules accumulation rate (pacing curve)
• Selects variants whose acoustic distance matches schedule
• Controls overlap rhetoric (hide/expose transitions)

Budget‑as‑Schedule Algorithm

# Target_budget = total acoustic distance to "spend" # K_steps = number of variants to select For each step s (1..K_steps): progress = s / K_steps # Pacing curve determines accumulation rate if Linear: sched = target_budget × progress if Accelerate: sched = target_budget × progress² if Decelerate: sched = target_budget × √progress # Need to accumulate this much distance by this step ideal_total_distance = sched # Greedy selection: pick variant whose distance from current # best matches the needed accumulation needed_this_step = ideal_total - current_total Select variant with distance closest to needed_this_step

Acoustic Distance Calculation

For variants i and j: # MFCC feature vectors (13‑dimensional) f_i = [MFCC1_mean, MFCC2_mean, ..., MFCC13_mean] f_j = [MFCC1_mean, MFCC2_mean, ..., MFCC13_mean] # Optionally include variance (motion tracking) if track_motion_variance: f_i = [means..., variances...] # 26‑dimensional f_j = [means..., variances...] # Euclidean distance distance = √( Σ (f_i[d] - f_j[d])² ) This measures acoustic similarity: smaller distance = more similar timbre

Overlap Rhetoric

Hide Ruptures (Smooth Transitions)
• Large acoustic distance → Longer crossfade
• Smooths over timbral differences
• Creates continuous, evolving texture
• Formula: overlap = min(0.9, distance/median × 0.4)
Expose Ruptures (Hard Transitions)
• Large acoustic distance → Shorter crossfade
• Emphasizes timbral contrasts
• Creates clear sectional boundaries
• Formula: overlap = max(0.05, 0.6 / (distance/median))
Median normalization: All distances scaled by global median distance for consistent behavior across different source materials.

Presets & Rhetoric

Built‑in Presets

Preset	Rhetoric	Pacing	Overlap	Pitch Range	Typical Use
Smooth Drift	Evolutionary, continuous	Linear	Hide Ruptures	±0.5 semitones	Ambient textures, gradual transformation
Violent Rupture	Disruptive, sectional	Accelerate	Expose Ruptures	±12 semitones	Dramatic contrasts, electroacoustic composition
Nervous Energy	Agitated, unpredictable	Decelerate	Expose Ruptures	±3 semitones	Glitch, IDM, rhythmic experimentation

Pacing Curves

Linear (Steady accumulation)
• Distance accumulates at constant rate
• Even pacing throughout
• Predictable, meditative feel
• Formula: sched = budget × progress
Accelerate (Slow start → Rush to finish)
• Starts with similar variants
• Accumulates differences faster toward end
• Creates tension and climax
• Formula: sched = budget × progress²
Decelerate (Explosive start → Stabilize)
• Starts with dramatic contrasts
• Gradually settles into similarity
• Creates release/resolution
• Formula: sched = budget × √progress

Parameters

Parameter	Default	Range	Description
Structural Form
N_variants	30	10–100	Number of variants to generate (pool size)
K_steps	8	3–20	Number of variants to select for final composition
Target_budget	60.0	10–200	Total acoustic distance to accumulate
Timbre & Motion
Track_motion_variance	✓	on/off	Include MFCC variance in distance (captures spectral motion)
Pitch_range_st	2.0	0–24	Maximum pitch shift in semitones (±range)
Time_stretch	0.15	0–1.0	Maximum time‑stretch factor (±range)
Formant_shift_range	0.15	0–0.5	Maximum formant shift factor (±range)
Random_seed	1987	any integer	Seed for reproducible random transformations
Skip_first	✓	on/off	Skip original (first variant) in final assembly

Parameter interactions:

N_variants vs K_steps: Larger pool (N) gives algorithm more choice; K should be ≤ N. Typical ratio: N = 3–4× K.
Target_budget: Higher values = more acoustic change accumulated = more dramatic transformation over sequence. Start with 40–80.
Pitch_range_st: Values >12 create octave jumps; 0.5–3 for subtle variations; 12+ for extreme transformations.
Time_stretch: 0.1–0.3 for subtle timing variations; 0.5+ for dramatic duration changes (handled with physics safety).
Track_motion_variance: On for sounds with spectral evolution (vowels, slides); Off for static sounds.

Processing Workflow

Step 1: Setup & Analysis

Input preparation: • Checks mono/stereo status
• Creates working copy
• Extracts pitch for baseline reference
• Falls back to 150 Hz if pitch undefined
Physics safety: • Prevents extreme time‑stretching below Praat's 0.064 s limit
• Adjusts duration factors dynamically
• Ensures all variants are processable

Step 2: Variant Generation

For each of N_variants: 1. Random pitch shift within ±pitch_range_st semitones
2. Random formant shift within ±formant_shift_range
3. Random time‑stretch within ±time_stretch
4. Apply via Change Gender (preserves formants for pitch shift)
5. Scale peak to 0.9 (prevents clipping)
Stereo handling: • If input stereo: process L/R separately then recombine
• If input mono: process directly
• All variants retain original channel format

Step 3: Feature Extraction & Distance Matrix

MFCC analysis: • Convert variant to mono for analysis (temporary)
• Compute 13‑dimensional MFCC
• Extract mean of each coefficient (13 values)
• If Track_motion_variance: also extract standard deviation (13 more values)
Crash protection: • Checks duration > 0.025 s before MFCC
• Falls back to zero vector if MFCC fails
• Handles silent/short variants gracefully
Distance calculation: • Euclidean distance between all variant pairs
• Store in symmetric matrix
• Compute global median distance for normalization

Step 4: Budget‑as‑Schedule Selection

Scheduling: 1. Calculate schedule for each step based on pacing curve
2. For step 1: always select variant 1 (or skip if skip_first)
3. For subsequent steps:
- Calculate needed distance = schedule[step] - accumulated
- Search unused variants for closest match to needed distance
- Select best match
- Update accumulated distance
Greedy algorithm properties: • Locally optimal at each step
• Creates coherent progression
• Respects overall budget constraint

Step 5: Assembly with Rhetorical Overlap

Concatenation logic: 1. Start with first selected variant
2. For each subsequent variant:
- Calculate relative distance = step_distance / global_median
- Apply overlap rhetoric formula (hide/expose)
- Compute overlap duration = variant_duration × overlap_factor
- Concatenate with overlap
Overlap factors: • Hide Ruptures: 0.1–0.9, proportional to distance
• Expose Ruptures: 0.05–0.9, inversely proportional
Finalization: • Rename output with preset identifier
• Clean up all temporary objects
• Auto‑play result

Applications

Electroacoustic Composition

Use case: Transforming recorded sounds into structured compositions.

Technique: Record short environmental sounds, process with Violent Rupture preset.

Example: Door slam → accumulating, pitch‑shifted canon with exposed ruptures = rhythmic texture.

Vocal Processing & Choral Effects

Use case: Creating virtual choirs or vocal ensembles from single voice.

Technique: Use spoken/sung phrase with Smooth Drift preset, moderate pitch range.

Result: Accumulating vocal variants create impression of multiple singers entering gradually.

Instrumental Texture Generation

Use case: Generating evolving accompaniments from single notes.

Workflow:

Record sustained instrument note (strings, wind, synth)
Process with Nervous Energy preset, Track_motion_variance = on
Result: Agitated, rhythmically complex texture with spectral evolution
Layer with original for enhanced effect

Sound Design for Media

Use case: Creating evolving soundscapes, transitions, UI sounds.

Technique: Use short synthetic sounds with Decelerate pacing.

Example: UI "click" → accumulating variants with exposed ruptures = futuristic transition sound.

Practical Workflow Examples

🎵 Vocal Canon (Virtual Choir)

Goal: Create choral texture from single vocal phrase.

Settings:

Source: 3‑second sung phrase
Preset: Smooth Drift
Custom adjustments: Pitch_range_st = 1.5, N_variants = 40, K_steps = 12
Target_budget: 50.0
Skip_first: Yes

Result: Gradual accumulation of subtly varied vocal entries, creating impression of choir building over time.

⚡ Percussive Accumulation (Rhythmic)

Goal: Transform single hit into complex rhythmic pattern.

Settings:

Source: Drum hit or percussive sound (0.5 s)
Preset: Nervous Energy
Custom adjustments: Time_stretch = 0.3, Pitch_range_st = 8.0
Overlap_mode: Expose Ruptures
Track_motion_variance: Off

Result: Stuttered, rhythmically varied accumulation with pitch variations, suitable for electronic music.

🌊 Environmental Soundscape

Goal: Create evolving texture from environmental recording.

Settings:

Source: 2‑second water drop or wind gust
Preset: Custom
Pacing_curve: Accelerate
Overlap_mode: Hide Ruptures
Pitch_range_st: 0.8 (subtle)
Formant_shift_range: 0.25 (more timbral variation)
Target_budget: 80.0

Result: Gradually accumulating environmental texture that starts similar, becomes more varied, creating narrative arc.

Troubleshooting

Problem: "Please select exactly one Sound object"
Cause: No sound selected, or multiple selected.
Solution: Select exactly one Sound in Praat Objects window before running.

Problem: Processing very slow (>2 minutes)
Cause: Large N_variants or long source duration.
Solution: Reduce N_variants (≤30), use shorter source (≤5 s), or reduce MFCC dimension (script uses 13).

Problem: Output mostly silent or truncated
Cause: Source too short or quiet, MFCC analysis failing.
Solution: Use louder, longer source (>0.5 s), check amplitude, disable Track_motion_variance.

Problem: Praat crashes during variant generation
Cause: Extreme time‑stretching creating sounds <0.064 s.
Solution: Reduce time_stretch parameter (≤0.3), script has physics safety but extreme values may still cause issues.

Problem: All variants sound identical
Cause: Pitch_range_st, time_stretch, formant_shift_range too small.
Solution: Increase variation parameters, check Random_seed is changing between runs.

Performance Optimization

For faster processing:

Source duration: Keep under 5 seconds for reasonable processing time.
Variant count: N_variants = 20–30 typically sufficient; each additional variant adds O(N²) distance calculations.
MFCC dimension: Script uses 13 coefficients; cannot be changed without modifying code.
Stereo vs mono: Mono processes ≈2× faster than stereo.
Motion variance: Turning off Track_motion_variance reduces feature dimension from 26 to 13.