LZ-Inspired Audio Variations — User Guide
Lempel-Ziv inspired audio analysis and variation: Automated segmentation, similarity detection, and algorithmic transformation using vectorization, sort-and-sweep, and matrix optimization.
What this does
This script implements Lempel-Ziv inspired audio variation generation — an algorithmic approach to automatically analyze, segment, and transform audio based on pattern similarity detection. Inspired by LZ data compression algorithms, it identifies repeating patterns in audio and creates variations through transformation rules. The implementation includes three major optimizations: vectorization, sort-and-sweep, and matrix operations for efficient large-scale audio analysis.
Key Features:
- Three Analysis Modes — Pitch, spectral, or intensity-based segmentation
- Optimized Similarity Detection — Vectorized comparison with early breaking
- Six Variation Methods — Pitch shift, time stretch, amplitude modulation, spectral filtering, reversal, granular shuffle
- Smart Dictionary Building — LZ-inspired pattern matching with similarity threshold
- Performance Optimizations — Vectorization, sort-and-sweep, matrix math for speed
- Configurable Output — Precise control over output duration and variation intensity
Technical Implementation: (1) Segmentation: Divide audio into overlapping windows (configurable size/overlap). (2) Feature extraction: Analyze each window using pitch, spectral, or intensity features. (3) Vectorization: Load features into arrays for fast memory access. (4) Sort-and-sweep: Sort features to enable early breaking in comparisons. (5) Matrix operations: For correlation metric, use matrix multiplication for efficiency. (6) Dictionary construction: Identify similar segments using distance metrics and threshold. (7) Variation generation: Apply one of six transformation methods to selected segments. (8) Recomposition: Concatenate varied segments to specified output duration. Key insight: O(n²) similarity comparison optimized to near O(n log n) through sorting and early breaking.
Quick start
- In Praat Objects window, select a Sound object to analyze.
- Open script:
LZ_audio_variations.praat - Configure segmentation:
- Analysis_type: Pitch, Spectrum, or Intensity
- Window_size: 0.1 seconds (100ms segments)
- Overlap: 0.5 (50% overlap between windows)
- Set similarity detection:
- Similarity_threshold: 0.8 (higher = more strict matching)
- Distance_metric: Euclidean, Correlation, or Cosine
- Choose variation method:
- Variation_method: 6 options from pitch shift to granular shuffle
- Variation_amount: 0.5 (moderate transformation)
- Configure output:
- Output_duration: 10 seconds (final composition length)
- Randomize_dictionary_order: Yes (shuffle pattern selection)
- Play_output: Yes (auto-play result)
- Click Run — script analyzes, builds dictionary, creates variations
- Output appears as "originalname_LZ_variation" in Objects window
LZ Theory & Algorithm
Lempel-Ziv Compression Basis
🔤 Original LZ Algorithm (LZ77)
Data compression concept:
- Sliding window: Move through data with fixed look-ahead buffer
- Pattern matching: Find longest match between buffer and dictionary
- Reference encoding: Encode as (offset, length, next_char)
- Dictionary update: Add new patterns as they're discovered
Adaptation for audio:
Key differences from compression:
- No compression goal — creative variation instead
- Features instead of exact byte matching
- Similarity threshold instead of exact match
- Transformation rules instead of reference encoding
Segmentation Strategy
Feature Extraction Methods
📊 Three Analysis Types
1. Pitch Analysis (analysis_type = 1):
- Features extracted: Mean F0, Standard deviation of F0
- Praat command:
To Pitch: 0, 75, 600 - Best for: Vocal audio, monophonic instruments, pitch-based patterns
- Feature range: F0: 75-600 Hz (adjustable via script)
2. Spectral Analysis (analysis_type = 2):
- Features extracted: Spectral centroid (CoG), Spectral standard deviation
- Praat commands:
To Spectrum → Get centre of gravity, Get standard deviation - Best for: Textural sounds, noise, complex timbres, polyphonic music
- Feature range: CoG: 0-5000 Hz typically
3. Intensity Analysis (analysis_type = 3):
- Features extracted: Mean intensity, Maximum intensity
- Praat command:
To Intensity: 100, 0, "yes" - Best for: Percussive sounds, amplitude envelopes, dynamic patterns
- Feature range: Intensity: 0-100 dB typically
Distance Metrics
📐 Similarity Calculation Methods
1. Euclidean Distance (distance_metric = 1):
2. Correlation Distance (distance_metric = 2):
3. Cosine Distance (distance_metric = 3):
When to use each:
- Euclidean: General purpose, magnitude-sensitive
- Correlation: Pattern shape matching, magnitude-invariant
- Cosine: Direction/orientation matching
Optimization Techniques
Optimization 1: Vectorization
⚡ Array-Based Processing
Problem: Table access in Praat is slow for large datasets
Solution: Load features into numeric arrays once
Performance gain: 10-100× faster for large datasets
Arrays used:
start_times#— Window start timesend_times#— Window end timesfeature1#— Primary feature (mean F0, CoG, mean intensity)feature2#— Secondary feature (stdev F0, spectral stdev, max intensity)original_indices#— Original window indices
Optimization 2: Sort-and-Sweep
🔍 Early Breaking Algorithm
Problem: O(n²) comparisons for n windows (e.g., 3000 windows = 9M comparisons)
Solution: Sort by primary feature, break inner loop when difference too large
Performance gain: Reduces O(n²) to approximately O(n log n) for sorted data
Example with numbers:
- Windows sorted by mean F0: [100, 105, 110, 200, 205, 210] Hz
- similarity_threshold = 0.8, max_diff = 600×(1-0.8) = 120 Hz
- Comparing window 1 (100 Hz) with window 4 (200 Hz): diff = 100 Hz ≤ 120 → compare
- Comparing window 1 with window 5 (205 Hz): diff = 105 Hz ≤ 120 → compare
- In original algorithm: all 15 comparisons made
- With sort-and-sweep: comparisons stop when diff > 120 Hz
Optimization 3: Matrix Operations
🧮 Correlation Matrix Calculation
Problem: Correlation calculation O(n² × m) where m = feature dimension
Solution: Use matrix multiplication for batch correlation
Limitation: Only works for correlation metric with single feature column
Performance gain: O(n²) setup then O(1) access vs O(n²) calculation each time
Memory tradeoff: Stores n² matrix vs calculating on demand
Performance Metrics
Parameters Guide
Segmentation Parameters
| Parameter | Default | Range | Description |
|---|---|---|---|
| Analysis_type | Pitch | Pitch/Spectrum/Intensity | Feature extraction method. Pitch for melodic, Spectrum for timbral, Intensity for dynamic patterns |
| Window_size | 0.1 | 0.01-1.0 | Analysis window in seconds. Smaller = more detail but more windows. Speech: 0.05-0.2s, Music: 0.1-0.5s |
| Overlap | 0.5 | 0.0-0.99 | Overlap proportion between windows. 0.5 = 50% overlap. Higher = smoother analysis but more windows |
Similarity Detection Parameters
| Parameter | Default | Range | Description |
|---|---|---|---|
| Similarity_threshold | 0.8 | 0.0-1.0 | Minimum similarity for pattern matching. 0.9 = near-identical, 0.7 = broadly similar, 0.5 = loosely related |
| Distance_metric | Euclidean | Euclidean/Correlation/Cosine | Distance calculation method. Euclidean for magnitude, Correlation for pattern shape, Cosine for direction |
Variation Parameters
| Parameter | Default | Range | Description |
|---|---|---|---|
| Variation_method | Pitch shift | 6 methods | Transformation type. See detailed section below for each method |
| Variation_amount | 0.5 | 0.0-1.0 | Intensity of variation. 0.1 = subtle, 0.5 = moderate, 0.9 = extreme |
Output Parameters
| Parameter | Default | Range | Description |
|---|---|---|---|
| Output_duration | 10 | 0.1-3600 | Final output length in seconds. Independent of input duration. Can be shorter or longer |
| Randomize_dictionary_order | 1 (yes) | 0/1 | Random selection from dictionary vs sequential. Yes for variation, No for predictable patterns |
| Play_output | 1 (yes) | 0/1 | Auto-play result after processing |
Parameter Interactions
- Small windows (0.05s) + high threshold (0.9): Finds exact micro-patterns
- Large windows (0.5s) + low threshold (0.6): Finds broadly similar sections
- Medium windows (0.1-0.2s): Good balance for most audio
| Audio Type | Analysis Type | Window Size | Similarity |
|---|---|---|---|
| Speech | Pitch | 0.05-0.1s | 0.7-0.8 |
| Singing | Pitch | 0.1-0.2s | 0.8-0.9 |
| Percussion | Intensity | 0.02-0.05s | 0.6-0.7 |
| Ambient texture | Spectrum | 0.2-0.5s | 0.5-0.6 |
| Polyphonic music | Spectrum | 0.1-0.3s | 0.7-0.8 |
Variation Methods
Method 1: Pitch Shift
🎵 Pitch-Based Transformation
Algorithm:
Effect: Transposes pitch while preserving timing and formants
Best for: Melodic variation, harmonic exploration
Parameters:
- variation_amount = 0.2: Subtle detuning (±2.4 semitones)
- variation_amount = 0.5: Moderate shifts (±6 semitones)
- variation_amount = 0.8: Extreme transposition (±9.6 semitones)
Method 2: Time Stretch
⏱️ Duration Transformation
Algorithm:
Effect: Changes duration while preserving pitch
Best for: Rhythmic variation, tempo changes
Parameters:
- variation_amount = 0.2: 0.8-1.2× duration (subtle)
- variation_amount = 0.5: 0.5-1.5× duration (moderate)
- variation_amount = 0.8: 0.2-1.8× duration (extreme)
Method 3: Amplitude Modulation
📈 Dynamic Transformation
Algorithm:
Effect: Adds tremolo/amplitude modulation
Best for: Adding movement to static sounds, rhythmic effects
Parameters:
- variation_amount = 0.2: 12 Hz, 20% depth (subtle)
- variation_amount = 0.5: 60 Hz, 50% depth (moderate)
- variation_amount = 0.8: 90 Hz, 80% depth (strong)
Method 4: Spectral Filter
🎛️ Timbral Transformation
Algorithm:
Effect: Low-pass filtering with random cutoff
Best for: Timbre variation, muffling effects
Parameters:
- variation_amount = 0.2: Cutoff 900-1100 Hz, 20% reduction
- variation_amount = 0.5: Cutoff 750-1250 Hz, 50% reduction
- variation_amount = 0.8: Cutoff 600-1400 Hz, 80% reduction
Method 5: Reverse
↪️ Temporal Transformation
Algorithm:
Effect: Randomly reverses segments
Best for: Glitch effects, surreal textures
Parameters:
- variation_amount = 0.2: 20% reversed (sparse)
- variation_amount = 0.5: 50% reversed (balanced)
- variation_amount = 0.8: 80% reversed (mostly reversed)
Method 6: Granular Shuffle
🌀 Micro-Segmentation
Algorithm:
Effect: Scrambles micro-structure while preserving macro features
Best for: Textural transformation, granular synthesis effects
Parameters:
- Grain size fixed: 0.02s (20ms) — optimal for granular effects
- variation_amount: Not used for this method
- Effect strength: Controlled by how many segments use this method
Applications
Algorithmic Composition
Use case: Generate new musical material from existing recordings
Technique: Use pitch analysis with moderate similarity threshold
Example workflow:
- Source: Piano recording
- Analysis: Pitch-based, window_size=0.2s, similarity=0.7
- Variation: Pitch shift (amount=0.3) + Time stretch (amount=0.2)
- Result: New piano piece with similar melodic contours but varied pitches/rhythms
Sound Design
Use case: Create complex textures from simple source material
Technique: Use spectral analysis with granular shuffle
Example workflow:
- Source: Water droplet recording
- Analysis: Spectrum-based, window_size=0.05s, similarity=0.6
- Variation: Granular shuffle (all segments)
- Result: Continuous water texture from discrete droplets
Voice Processing
Use case: Transform speech into musical or textural material
Technique: Pitch analysis with extreme variations
Example workflow:
- Source: Spoken phrase
- Analysis: Pitch-based, window_size=0.1s, similarity=0.8
- Variation: Pitch shift (amount=0.7) + Reverse (amount=0.3)
- Result: Speech transformed into melodic, surreal texture
Audio Restoration
Use case: Fill gaps or damaged sections using similar intact material
Technique: High similarity threshold with time stretching
Example workflow:
- Source: Damaged recording with clicks/pops
- Analysis: Intensity-based, window_size=0.02s, similarity=0.9
- Variation: Find similar clean sections to replace damaged ones
- Result: Cleaned audio using self-similarity
Educational Tool
Use case: Demonstrate pattern recognition in audio
Technique: Vary parameters to show different similarity concepts
Learning objectives:
- Understand window-based analysis
- Explore different distance metrics
- Hear effects of similarity thresholds
- Experience transformation methods
Practical Example Configurations
🎹 Melodic Recomposition
Goal: Create new melody from existing one
Settings:
- Analysis_type: Pitch
- Window_size: 0.15s
- Similarity_threshold: 0.75
- Distance_metric: Euclidean
- Variation_method: Pitch shift
- Variation_amount: 0.4
- Output_duration: 30s
Result: New melodic line with similar contour but different pitches
🌊 Textural Transformation
Goal: Transform discrete sounds into continuous texture
Settings:
- Analysis_type: Spectrum
- Window_size: 0.08s
- Similarity_threshold: 0.65
- Distance_metric: Correlation
- Variation_method: Granular shuffle
- Output_duration: 60s
Result: Smooth, evolving texture from source sounds
🗣️ Speech Deconstruction
Goal: Deconstruct speech into abstract sound
Settings:
- Analysis_type: Pitch
- Window_size: 0.06s
- Similarity_threshold: 0.85
- Distance_metric: Cosine
- Variation_method: Reverse + Spectral filter
- Variation_amount: 0.6
- Output_duration: 45s
Result: Abstract, surreal version of original speech
Complete Workflow
Step-by-Step Process
🔧 Script Execution Flow
Phase 1: Setup & Initialization
Phase 2: Feature Extraction
Phase 3: Vectorization
Phase 4: Sorting for Optimization
Phase 5: Matrix Precomputation (Optional)
Phase 6: Similarity Detection & Dictionary Building
Phase 7: Variation Generation
Phase 8: Recomposition & Output
Information Window Output
Troubleshooting Common Issues
Causes: Too many windows (small window_size), high overlap, large file
Solutions: • Increase window_size (0.2s instead of 0.05s)
• Reduce overlap (0.3 instead of 0.7)
• Use shorter input file
• Check speedup factor in output — if < 2x, sort-and-sweep not helping
Causes: Similarity_threshold too high, distance metric inappropriate
Solutions: • Lower similarity_threshold (0.6 instead of 0.9)
• Try different distance_metric
• Try different analysis_type
• Check if features are being extracted correctly (undefined values?)
Causes: Low similarity_threshold, extreme variation_amount
Solutions: • Increase similarity_threshold for more coherent patterns
• Reduce variation_amount (0.3 instead of 0.8)
• Use less disruptive variation_method (pitch shift instead of granular shuffle)
• Increase window_size for longer, more coherent segments
Causes: Too many windows creating large matrices, memory limits
Solutions: • Reduce number of windows (increase window_size, reduce overlap)
• Use shorter input file
• Avoid correlation metric with very large window counts
• Increase Praat memory allocation in preferences
Causes: No crossfade between concatenated segments
Solutions: • Increase overlap parameter (creates overlapping windows)
• Apply crossfade manually after generation
• Use smaller variation_amount for smoother transitions
• This is inherent to segment concatenation approach