Gestural Accumulator — User Guide
Compositional canon generator: creates accumulating variants of a sound gesture, arranges them via acoustic distance scheduling to form evolving sonic narratives.
What this does
This script transforms a single sound gesture into a compositional canon — an accumulating sequence of transformed variants arranged according to acoustic similarity. It generates multiple pitch‑, time‑, and formant‑shifted versions of the input, analyzes their MFCC feature space, then sequences them using a "budget‑as‑schedule" algorithm that controls the pacing and overlap rhetoric of the resulting accumulation.
Key Features:
- 3 rhetorical presets – Smooth Drift, Violent Rupture, Nervous Energy
- Acoustic distance‑based sequencing – Variants arranged by MFCC similarity
- Budget‑as‑schedule algorithm – Controls pacing and accumulation rate
- Stereo‑aware processing – Preserves stereo input throughout transformations
- Physics‑safe transformations – Prevents crashes from extreme time‑stretching
- Rhetorical overlap control – Hide or expose transitions between variants
Technical Implementation: (1) Variant generation: Creates N variants via Change Gender (pitch, formant, time‑stretch). (2) Feature extraction: Computes 13‑dimensional MFCC mean vectors (plus variance if tracking motion). (3) Distance matrix: Calculates Euclidean distances between all variant pairs. (4) Budget scheduling: Uses target_budget to schedule accumulation rate (linear/accelerating/decelerating). (5) Greedy selection: Sequentially picks variants whose distance matches the schedule. (6) Assembly: Concatenates selected variants with rhetorical overlap (hide/expose transitions).
Quick start
- In Praat, select exactly one Sound object (mono or stereo).
- Run script… →
Gestural_Accumulator.praat. - Choose a Preset or select "Custom" for full parameter control.
- Adjust structural parameters if needed (N_variants, K_steps, Target_budget).
- Click OK – the script will generate variants, analyze, and assemble.
- The output appears as Sound originalname_v10_presetname.
Conceptual Framework
The Three‑Layer Model
• Creates N transformed versions of the source
• Transformations: pitch shift (±range_st), time‑stretch (±time_stretch), formant shift (±range)
• Each variant is a unique "voice" in the canon
2. FEATURE SPACE (Middle Layer)
• Extracts 13 MFCC coefficients per variant
• Optionally tracks MFCC variance (motion tracking)
• Computes Euclidean distance between all variant pairs
• Creates acoustic similarity map
3. NARRATIVE SCHEDULING (Top Layer)
• Uses target_budget as "distance budget"
• Schedules accumulation rate (pacing curve)
• Selects variants whose acoustic distance matches schedule
• Controls overlap rhetoric (hide/expose transitions)
Budget‑as‑Schedule Algorithm
Acoustic Distance Calculation
Overlap Rhetoric
• Large acoustic distance → Longer crossfade
• Smooths over timbral differences
• Creates continuous, evolving texture
• Formula: overlap = min(0.9, distance/median × 0.4)
Expose Ruptures (Hard Transitions)
• Large acoustic distance → Shorter crossfade
• Emphasizes timbral contrasts
• Creates clear sectional boundaries
• Formula: overlap = max(0.05, 0.6 / (distance/median))
Median normalization: All distances scaled by global median distance for consistent behavior across different source materials.
Presets & Rhetoric
Built‑in Presets
| Preset | Rhetoric | Pacing | Overlap | Pitch Range | Typical Use |
|---|---|---|---|---|---|
| Smooth Drift | Evolutionary, continuous | Linear | Hide Ruptures | ±0.5 semitones | Ambient textures, gradual transformation |
| Violent Rupture | Disruptive, sectional | Accelerate | Expose Ruptures | ±12 semitones | Dramatic contrasts, electroacoustic composition |
| Nervous Energy | Agitated, unpredictable | Decelerate | Expose Ruptures | ±3 semitones | Glitch, IDM, rhythmic experimentation |
Pacing Curves
• Distance accumulates at constant rate
• Even pacing throughout
• Predictable, meditative feel
• Formula: sched = budget × progress
Accelerate (Slow start → Rush to finish)
• Starts with similar variants
• Accumulates differences faster toward end
• Creates tension and climax
• Formula: sched = budget × progress²
Decelerate (Explosive start → Stabilize)
• Starts with dramatic contrasts
• Gradually settles into similarity
• Creates release/resolution
• Formula: sched = budget × √progress
Parameters
| Parameter | Default | Range | Description |
|---|---|---|---|
| Structural Form | |||
| N_variants | 30 | 10–100 | Number of variants to generate (pool size) |
| K_steps | 8 | 3–20 | Number of variants to select for final composition |
| Target_budget | 60.0 | 10–200 | Total acoustic distance to accumulate |
| Timbre & Motion | |||
| Track_motion_variance | ✓ | on/off | Include MFCC variance in distance (captures spectral motion) |
| Pitch_range_st | 2.0 | 0–24 | Maximum pitch shift in semitones (±range) |
| Time_stretch | 0.15 | 0–1.0 | Maximum time‑stretch factor (±range) |
| Formant_shift_range | 0.15 | 0–0.5 | Maximum formant shift factor (±range) |
| Random_seed | 1987 | any integer | Seed for reproducible random transformations |
| Skip_first | ✓ | on/off | Skip original (first variant) in final assembly |
- N_variants vs K_steps: Larger pool (N) gives algorithm more choice; K should be ≤ N. Typical ratio: N = 3–4× K.
- Target_budget: Higher values = more acoustic change accumulated = more dramatic transformation over sequence. Start with 40–80.
- Pitch_range_st: Values >12 create octave jumps; 0.5–3 for subtle variations; 12+ for extreme transformations.
- Time_stretch: 0.1–0.3 for subtle timing variations; 0.5+ for dramatic duration changes (handled with physics safety).
- Track_motion_variance: On for sounds with spectral evolution (vowels, slides); Off for static sounds.
Processing Workflow
Step 1: Setup & Analysis
• Creates working copy
• Extracts pitch for baseline reference
• Falls back to 150 Hz if pitch undefined
Physics safety: • Prevents extreme time‑stretching below Praat's 0.064 s limit
• Adjusts duration factors dynamically
• Ensures all variants are processable
Step 2: Variant Generation
2. Random formant shift within ±formant_shift_range
3. Random time‑stretch within ±time_stretch
4. Apply via Change Gender (preserves formants for pitch shift)
5. Scale peak to 0.9 (prevents clipping)
Stereo handling: • If input stereo: process L/R separately then recombine
• If input mono: process directly
• All variants retain original channel format
Step 3: Feature Extraction & Distance Matrix
• Compute 13‑dimensional MFCC
• Extract mean of each coefficient (13 values)
• If Track_motion_variance: also extract standard deviation (13 more values)
Crash protection: • Checks duration > 0.025 s before MFCC
• Falls back to zero vector if MFCC fails
• Handles silent/short variants gracefully
Distance calculation: • Euclidean distance between all variant pairs
• Store in symmetric matrix
• Compute global median distance for normalization
Step 4: Budget‑as‑Schedule Selection
2. For step 1: always select variant 1 (or skip if skip_first)
3. For subsequent steps:
- Calculate needed distance = schedule[step] - accumulated
- Search unused variants for closest match to needed distance
- Select best match
- Update accumulated distance
Greedy algorithm properties: • Locally optimal at each step
• Creates coherent progression
• Respects overall budget constraint
Step 5: Assembly with Rhetorical Overlap
2. For each subsequent variant:
- Calculate relative distance = step_distance / global_median
- Apply overlap rhetoric formula (hide/expose)
- Compute overlap duration = variant_duration × overlap_factor
- Concatenate with overlap
Overlap factors: • Hide Ruptures: 0.1–0.9, proportional to distance
• Expose Ruptures: 0.05–0.9, inversely proportional
Finalization: • Rename output with preset identifier
• Clean up all temporary objects
• Auto‑play result
Applications
Electroacoustic Composition
Use case: Transforming recorded sounds into structured compositions.
Technique: Record short environmental sounds, process with Violent Rupture preset.
Example: Door slam → accumulating, pitch‑shifted canon with exposed ruptures = rhythmic texture.
Vocal Processing & Choral Effects
Use case: Creating virtual choirs or vocal ensembles from single voice.
Technique: Use spoken/sung phrase with Smooth Drift preset, moderate pitch range.
Result: Accumulating vocal variants create impression of multiple singers entering gradually.
Instrumental Texture Generation
Use case: Generating evolving accompaniments from single notes.
Workflow:
- Record sustained instrument note (strings, wind, synth)
- Process with Nervous Energy preset, Track_motion_variance = on
- Result: Agitated, rhythmically complex texture with spectral evolution
- Layer with original for enhanced effect
Sound Design for Media
Use case: Creating evolving soundscapes, transitions, UI sounds.
Technique: Use short synthetic sounds with Decelerate pacing.
Example: UI "click" → accumulating variants with exposed ruptures = futuristic transition sound.
Practical Workflow Examples
🎵 Vocal Canon (Virtual Choir)
Goal: Create choral texture from single vocal phrase.
Settings:
- Source: 3‑second sung phrase
- Preset: Smooth Drift
- Custom adjustments: Pitch_range_st = 1.5, N_variants = 40, K_steps = 12
- Target_budget: 50.0
- Skip_first: Yes
Result: Gradual accumulation of subtly varied vocal entries, creating impression of choir building over time.
⚡ Percussive Accumulation (Rhythmic)
Goal: Transform single hit into complex rhythmic pattern.
Settings:
- Source: Drum hit or percussive sound (0.5 s)
- Preset: Nervous Energy
- Custom adjustments: Time_stretch = 0.3, Pitch_range_st = 8.0
- Overlap_mode: Expose Ruptures
- Track_motion_variance: Off
Result: Stuttered, rhythmically varied accumulation with pitch variations, suitable for electronic music.
🌊 Environmental Soundscape
Goal: Create evolving texture from environmental recording.
Settings:
- Source: 2‑second water drop or wind gust
- Preset: Custom
- Pacing_curve: Accelerate
- Overlap_mode: Hide Ruptures
- Pitch_range_st: 0.8 (subtle)
- Formant_shift_range: 0.25 (more timbral variation)
- Target_budget: 80.0
Result: Gradually accumulating environmental texture that starts similar, becomes more varied, creating narrative arc.
Troubleshooting
Cause: No sound selected, or multiple selected.
Solution: Select exactly one Sound in Praat Objects window before running.
Cause: Large N_variants or long source duration.
Solution: Reduce N_variants (≤30), use shorter source (≤5 s), or reduce MFCC dimension (script uses 13).
Cause: Source too short or quiet, MFCC analysis failing.
Solution: Use louder, longer source (>0.5 s), check amplitude, disable Track_motion_variance.
Cause: Extreme time‑stretching creating sounds <0.064 s.
Solution: Reduce time_stretch parameter (≤0.3), script has physics safety but extreme values may still cause issues.
Cause: Pitch_range_st, time_stretch, formant_shift_range too small.
Solution: Increase variation parameters, check Random_seed is changing between runs.
Performance Optimization
- Source duration: Keep under 5 seconds for reasonable processing time.
- Variant count: N_variants = 20–30 typically sufficient; each additional variant adds O(N²) distance calculations.
- MFCC dimension: Script uses 13 coefficients; cannot be changed without modifying code.
- Stereo vs mono: Mono processes ≈2× faster than stereo.
- Motion variance: Turning off Track_motion_variance reduces feature dimension from 26 to 13.