LZ-Inspired Audio Variations — User Guide

Lempel-Ziv inspired audio analysis and variation: Automated segmentation, similarity detection, and algorithmic transformation using vectorization, sort-and-sweep, and matrix optimization.

Author: Shai Cohen Affiliation: Department of Music, Bar-Ilan University, Israel Version: 2.0 (2025) License: MIT License Repo: https://github.com/ShaiCohen-ops/Praat-plugin_AudioTools

Contents:

What this does Quick start LZ Theory & Algorithm Optimization Techniques Parameters Guide Variation Methods Applications Complete Workflow

What this does

This script implements Lempel-Ziv inspired audio variation generation — an algorithmic approach to automatically analyze, segment, and transform audio based on pattern similarity detection. Inspired by LZ data compression algorithms, it identifies repeating patterns in audio and creates variations through transformation rules. The implementation includes three major optimizations: vectorization, sort-and-sweep, and matrix operations for efficient large-scale audio analysis.

Key Features:

Three Analysis Modes — Pitch, spectral, or intensity-based segmentation
Optimized Similarity Detection — Vectorized comparison with early breaking
Six Variation Methods — Pitch shift, time stretch, amplitude modulation, spectral filtering, reversal, granular shuffle
Smart Dictionary Building — LZ-inspired pattern matching with similarity threshold
Performance Optimizations — Vectorization, sort-and-sweep, matrix math for speed
Configurable Output — Precise control over output duration and variation intensity

What is LZ-inspired audio processing? Inspired by Lempel-Ziv compression algorithms (LZ77/LZ78): (1) Dictionary building: Identify repeating patterns in data stream. (2) Reference encoding: Replace repetitions with references to dictionary. (3) Adaptation for audio: Instead of compression, use pattern detection for creative variation. (4) Key concepts: Sliding window analysis, similarity matching, pattern substitution. This script adapts LZ principles: (a) Segment audio into windows. (b) Extract features per window. (c) Build similarity dictionary. (d) Create variations by transforming similar segments. (e) Recombine into new composition. Applications: Algorithmic composition, sound design, audio texture generation, pattern discovery in recordings.

Technical Implementation: (1) Segmentation: Divide audio into overlapping windows (configurable size/overlap). (2) Feature extraction: Analyze each window using pitch, spectral, or intensity features. (3) Vectorization: Load features into arrays for fast memory access. (4) Sort-and-sweep: Sort features to enable early breaking in comparisons. (5) Matrix operations: For correlation metric, use matrix multiplication for efficiency. (6) Dictionary construction: Identify similar segments using distance metrics and threshold. (7) Variation generation: Apply one of six transformation methods to selected segments. (8) Recomposition: Concatenate varied segments to specified output duration. Key insight: O(n²) similarity comparison optimized to near O(n log n) through sorting and early breaking.

Quick start

In Praat Objects window, select a Sound object to analyze.
Open script: LZ_audio_variations.praat
Configure segmentation:
- Analysis_type: Pitch, Spectrum, or Intensity
- Window_size: 0.1 seconds (100ms segments)
- Overlap: 0.5 (50% overlap between windows)
Set similarity detection:
- Similarity_threshold: 0.8 (higher = more strict matching)
- Distance_metric: Euclidean, Correlation, or Cosine
Choose variation method:
- Variation_method: 6 options from pitch shift to granular shuffle
- Variation_amount: 0.5 (moderate transformation)
Configure output:
- Output_duration: 10 seconds (final composition length)
- Randomize_dictionary_order: Yes (shuffle pattern selection)
- Play_output: Yes (auto-play result)
Click Run — script analyzes, builds dictionary, creates variations
Output appears as "originalname_LZ_variation" in Objects window

Quick tip: Start with Window_size = 0.1s for speech, 0.05s for fast audio, 0.2s for music. Use Overlap = 0.5 for smooth transitions. For pitch-based analysis of voice: analysis_type = Pitch, similarity_threshold = 0.7-0.8. For texture generation: analysis_type = Spectrum, variation_method = Granular shuffle. Check Info window for performance metrics — speedup factor shows optimization effectiveness. Output duration controls final length regardless of input — set to 2-3× input duration for substantial variation. Variation_amount = 0.2-0.3 for subtle changes, 0.7-0.9 for extreme transformation.

Important: COMPUTATIONALLY INTENSIVE — large files or small windows create many segments (n² comparisons). Optimizations help but: 5-minute audio with 0.1s windows = 3000 windows = ~4.5M possible comparisons. Sort-and-sweep reduces this dramatically but still memory intensive. Window size affects results: Too small = fragmented patterns, too large = coarse analysis. Similarity_threshold critical: 0.9+ finds near-identical segments, 0.6+ finds broadly similar. Variation_method changes character drastically: Pitch shift preserves timing, granular shuffle destroys continuity. Randomize_dictionary_order: Yes for variation, No for predictable pattern repetition. Output may be discontinuous: Segments concatenated without crossfade.

LZ Theory & Algorithm

Lempel-Ziv Compression Basis

🔤 Original LZ Algorithm (LZ77)

Data compression concept:

Sliding window: Move through data with fixed look-ahead buffer
Pattern matching: Find longest match between buffer and dictionary
Reference encoding: Encode as (offset, length, next_char)
Dictionary update: Add new patterns as they're discovered

Adaptation for audio:

ORIGINAL LZ77 (text): Input: "ABRACADABRA" Output: (0,0,'A'), (0,0,'B'), (0,0,'R'), (0,0,'A'), (0,0,'C'), (0,0,'A'), (0,0,'D'), (7,4,'A') ADAPTED LZ77 (audio): Input: Audio signal Process: 1. Segment into windows [W1, W2, ..., Wn] 2. Extract features [F1, F2, ..., Fn] 3. Find similar windows: similarity(Fi, Fj) ≥ threshold 4. Build dictionary: D = {(i,j): similar(i,j)} 5. Generate variations: Transform windows based on dictionary 6. Recompose: Concatenate transformed windows

Key differences from compression:

No compression goal — creative variation instead
Features instead of exact byte matching
Similarity threshold instead of exact match
Transformation rules instead of reference encoding

Segmentation Strategy

WINDOW CALCULATIONS: Given: total_duration = length of input sound (seconds) window_size = analysis window size (seconds) overlap = overlap proportion (0-0.99) Calculations: hop_size = window_size × (1 - overlap) num_windows = floor((total_duration - window_size) / hop_size) + 1 Window i properties: start_time[i] = (i - 1) × hop_size end_time[i] = start_time[i] + window_size center_time[i] = start_time[i] + (window_size / 2) Example: 10s audio, window_size=0.1s, overlap=0.5 hop_size = 0.1 × (1 - 0.5) = 0.05s num_windows = floor((10 - 0.1) / 0.05) + 1 = 199 windows Window 1: 0.0-0.1s Window 2: 0.05-0.15s (50% overlap) Window 199: 9.9-10.0s

Feature Extraction Methods

📊 Three Analysis Types

1. Pitch Analysis (analysis_type = 1):

Features extracted: Mean F0, Standard deviation of F0
Praat command: To Pitch: 0, 75, 600
Best for: Vocal audio, monophonic instruments, pitch-based patterns
Feature range: F0: 75-600 Hz (adjustable via script)

2. Spectral Analysis (analysis_type = 2):

Features extracted: Spectral centroid (CoG), Spectral standard deviation
Praat commands: To Spectrum → Get centre of gravity, Get standard deviation
Best for: Textural sounds, noise, complex timbres, polyphonic music
Feature range: CoG: 0-5000 Hz typically

3. Intensity Analysis (analysis_type = 3):

Features extracted: Mean intensity, Maximum intensity
Praat command: To Intensity: 100, 0, "yes"
Best for: Percussive sounds, amplitude envelopes, dynamic patterns
Feature range: Intensity: 0-100 dB typically

Distance Metrics

📐 Similarity Calculation Methods

1. Euclidean Distance (distance_metric = 1):

Given two windows i and j with features (f1, f2): distance = √[(f1_i - f1_j)² + (f2_i - f2_j)²] Normalized similarity: similarity = 1 - (distance / max_possible_distance) max_possible_distance depends on feature type: • Pitch: ~600 Hz (range 75-600) • Spectrum: ~5000 Hz • Intensity: ~100 dB

2. Correlation Distance (distance_metric = 2):

Pearson correlation between feature vectors: r = cov(features_i, features_j) / (σ_i × σ_j) distance = 1 - r Matrix optimization: Feature matrix M (n_windows × 2 features) Correlation = (M × Mᵀ) / normalization Computed via matrix multiplication for speed

3. Cosine Distance (distance_metric = 3):

Cosine similarity between feature vectors: cos_sim = (f1_i·f1_j + f2_i·f2_j) / (||f_i|| × ||f_j||) distance = 1 - cos_sim Simplified in script: distance = 1 - [min(|f1_i|,|f1_j|) / max(|f1_i|,|f1_j|)] (Uses only primary feature for speed)

When to use each:

Euclidean: General purpose, magnitude-sensitive
Correlation: Pattern shape matching, magnitude-invariant
Cosine: Direction/orientation matching

Optimization Techniques

Optimization 1: Vectorization

⚡ Array-Based Processing

Problem: Table access in Praat is slow for large datasets

Solution: Load features into numeric arrays once

BEFORE (slow - table access each comparison): for i to num_windows for j to num_windows f1_i = Get value: i, "mean_f0" # Table access f1_j = Get value: j, "mean_f0" # Table access # Compare... AFTER (fast - array access): # Load once at start feature1# = zero#(num_windows) for i to num_windows feature1# [i] = Get value: i, "mean_f0" # Fast comparisons for i to num_windows for j to num_windows f1_i = feature1# [i] # Array access f1_j = feature1# [j] # Array access # Compare...

Performance gain: 10-100× faster for large datasets

Arrays used:

start_times# — Window start times
end_times# — Window end times
feature1# — Primary feature (mean F0, CoG, mean intensity)
feature2# — Secondary feature (stdev F0, spectral stdev, max intensity)
original_indices# — Original window indices

Optimization 2: Sort-and-Sweep

🔍 Early Breaking Algorithm

Problem: O(n²) comparisons for n windows (e.g., 3000 windows = 9M comparisons)

Solution: Sort by primary feature, break inner loop when difference too large

ALGORITHM: 1. Sort windows by primary feature (mean F0, CoG, or mean intensity) 2. For each window i: f1_i = primary feature of window i For each window j > i: f1_j = primary feature of window j diff = |f1_j - f1_i| # Since sorted, if diff > threshold, all subsequent j will also have diff > threshold if diff > max_acceptable_diff: break # Skip remaining comparisons for this i else: # Calculate full distance (including secondary feature) compute similarity... max_acceptable_diff calculation: For pitch: 600 × (1 - similarity_threshold) For spectrum: 5000 × (1 - similarity_threshold) For intensity: 100 × (1 - similarity_threshold)

Performance gain: Reduces O(n²) to approximately O(n log n) for sorted data

Example with numbers:

Windows sorted by mean F0: [100, 105, 110, 200, 205, 210] Hz
similarity_threshold = 0.8, max_diff = 600×(1-0.8) = 120 Hz
Comparing window 1 (100 Hz) with window 4 (200 Hz): diff = 100 Hz ≤ 120 → compare
Comparing window 1 with window 5 (205 Hz): diff = 105 Hz ≤ 120 → compare
In original algorithm: all 15 comparisons made
With sort-and-sweep: comparisons stop when diff > 120 Hz

Optimization 3: Matrix Operations

🧮 Correlation Matrix Calculation

Problem: Correlation calculation O(n² × m) where m = feature dimension

Solution: Use matrix multiplication for batch correlation

STANDARD APPROACH (slow): for i to n_windows for j to n_windows corr = correlation(features_i, features_j) # O(n²) operations MATRIX APPROACH (fast): # Create feature matrix M (n_windows × 1) # Transpose: Mᵀ (1 × n_windows) # Correlation matrix = M × Mᵀ (n_windows × n_windows) # Praat implementation: matrix = To Matrix (column): "feature_column" transposed = Transpose correlation_matrix = Multiply: matrix, transposed # Access precomputed values for i to n_windows for j to n_windows corr_value = Get value in cell: i, j # O(1) access after O(n²) setup

Limitation: Only works for correlation metric with single feature column

Performance gain: O(n²) setup then O(1) access vs O(n²) calculation each time

Memory tradeoff: Stores n² matrix vs calculating on demand

Performance Metrics

SCRIPT REPORTS OPTIMIZATION EFFECTIVENESS: Total possible comparisons (brute force): total = n × (n - 1) / 2 Comparisons actually made: made = count of distance calculations Comparisons skipped: skipped = early break savings Speedup factor: speedup = total / made Example from script output: Number of windows: 199 Total possible comparisons: 199×198/2 = 19,701 Comparisons made: 1,542 Comparisons skipped: 18,159 Speedup factor: 12.78x Interpretation: • Sort-and-sweep skipped 92% of comparisons • Processing 12.8× faster than brute force

Parameters Guide

Segmentation Parameters

Parameter	Default	Range	Description
Analysis_type	Pitch	Pitch/Spectrum/Intensity	Feature extraction method. Pitch for melodic, Spectrum for timbral, Intensity for dynamic patterns
Window_size	0.1	0.01-1.0	Analysis window in seconds. Smaller = more detail but more windows. Speech: 0.05-0.2s, Music: 0.1-0.5s
Overlap	0.5	0.0-0.99	Overlap proportion between windows. 0.5 = 50% overlap. Higher = smoother analysis but more windows

Similarity Detection Parameters

Parameter	Default	Range	Description
Similarity_threshold	0.8	0.0-1.0	Minimum similarity for pattern matching. 0.9 = near-identical, 0.7 = broadly similar, 0.5 = loosely related
Distance_metric	Euclidean	Euclidean/Correlation/Cosine	Distance calculation method. Euclidean for magnitude, Correlation for pattern shape, Cosine for direction

Variation Parameters

Parameter	Default	Range	Description
Variation_method	Pitch shift	6 methods	Transformation type. See detailed section below for each method
Variation_amount	0.5	0.0-1.0	Intensity of variation. 0.1 = subtle, 0.5 = moderate, 0.9 = extreme

Output Parameters

Parameter	Default	Range	Description
Output_duration	10	0.1-3600	Final output length in seconds. Independent of input duration. Can be shorter or longer
Randomize_dictionary_order	1 (yes)	0/1	Random selection from dictionary vs sequential. Yes for variation, No for predictable patterns
Play_output	1 (yes)	0/1	Auto-play result after processing

Parameter Interactions

Window Size vs Similarity Threshold:

Small windows (0.05s) + high threshold (0.9): Finds exact micro-patterns
Large windows (0.5s) + low threshold (0.6): Finds broadly similar sections
Medium windows (0.1-0.2s): Good balance for most audio

Analysis Type Recommendations:

Audio Type	Analysis Type	Window Size	Similarity
Speech	Pitch	0.05-0.1s	0.7-0.8
Singing	Pitch	0.1-0.2s	0.8-0.9
Percussion	Intensity	0.02-0.05s	0.6-0.7
Ambient texture	Spectrum	0.2-0.5s	0.5-0.6
Polyphonic music	Spectrum	0.1-0.3s	0.7-0.8

Grain size fixed: 0.02s (20ms) — optimal for granular effects
variation_amount: Not used for this method
Effect strength: Controlled by how many segments use this method

Applications

Algorithmic Composition

Use case: Generate new musical material from existing recordings

Technique: Use pitch analysis with moderate similarity threshold

Example workflow:

Source: Piano recording
Analysis: Pitch-based, window_size=0.2s, similarity=0.7
Variation: Pitch shift (amount=0.3) + Time stretch (amount=0.2)
Result: New piano piece with similar melodic contours but varied pitches/rhythms

Sound Design

Use case: Create complex textures from simple source material

Technique: Use spectral analysis with granular shuffle

Example workflow:

Source: Water droplet recording
Analysis: Spectrum-based, window_size=0.05s, similarity=0.6
Variation: Granular shuffle (all segments)
Result: Continuous water texture from discrete droplets

Voice Processing

Use case: Transform speech into musical or textural material

Technique: Pitch analysis with extreme variations

Example workflow:

Source: Spoken phrase
Analysis: Pitch-based, window_size=0.1s, similarity=0.8
Variation: Pitch shift (amount=0.7) + Reverse (amount=0.3)
Result: Speech transformed into melodic, surreal texture

Audio Restoration

Use case: Fill gaps or damaged sections using similar intact material

Technique: High similarity threshold with time stretching

Example workflow:

Source: Damaged recording with clicks/pops
Analysis: Intensity-based, window_size=0.02s, similarity=0.9
Variation: Find similar clean sections to replace damaged ones
Result: Cleaned audio using self-similarity

Educational Tool

Use case: Demonstrate pattern recognition in audio

Technique: Vary parameters to show different similarity concepts

Learning objectives:

Understand window-based analysis
Explore different distance metrics
Hear effects of similarity thresholds
Experience transformation methods

Practical Example Configurations

🎹 Melodic Recomposition

Goal: Create new melody from existing one

Settings:

Analysis_type: Pitch
Window_size: 0.15s
Similarity_threshold: 0.75
Distance_metric: Euclidean
Variation_method: Pitch shift
Variation_amount: 0.4
Output_duration: 30s

Result: New melodic line with similar contour but different pitches

🌊 Textural Transformation

Goal: Transform discrete sounds into continuous texture

Settings:

Analysis_type: Spectrum
Window_size: 0.08s
Similarity_threshold: 0.65
Distance_metric: Correlation
Variation_method: Granular shuffle
Output_duration: 60s

Result: Smooth, evolving texture from source sounds

🗣️ Speech Deconstruction

Goal: Deconstruct speech into abstract sound

Settings:

Analysis_type: Pitch
Window_size: 0.06s
Similarity_threshold: 0.85
Distance_metric: Cosine
Variation_method: Reverse + Spectral filter
Variation_amount: 0.6
Output_duration: 45s

Result: Abstract, surreal version of original speech

Complete Workflow

Step-by-Step Process

🔧 Script Execution Flow

Phase 1: Setup & Initialization

1. Validate input: Single Sound object selected 2. Get sound properties: name, sample_rate, duration 3. Calculate segmentation parameters: • hop_size = window_size × (1 - overlap) • num_windows = floor((duration - window_size)/hop_size) + 1 4. Create features table with window metadata

Phase 2: Feature Extraction

For each window i (1 to num_windows): 1. Calculate start_time, end_time 2. Based on analysis_type: • Pitch: Extract mean F0 and stdev F0 • Spectrum: Extract spectral centroid and stdev • Intensity: Extract mean and max intensity 3. Store in features table Optimization: Batch processing where possible (e.g., entire Pitch object created once for pitch analysis)

Phase 3: Vectorization

1. Create numeric arrays: • start_times#, end_times# • feature1#, feature2# (primary/secondary features) • original_indices# 2. Load data from table into arrays (Single pass, O(n) operation) 3. Memory benefit: Array access faster than table access for subsequent comparisons

Phase 4: Sorting for Optimization

1. Sort features table by primary feature: • Pitch: mean_f0 • Spectrum: spectral_cog • Intensity: mean_intensity 2. Reload arrays with sorted order (Maintains connection between features and original indices) 3. Enables sort-and-sweep optimization: Comparisons can break early when feature difference too large

Phase 5: Matrix Precomputation (Optional)

If distance_metric = Correlation: 1. Create matrix from primary feature column 2. Transpose matrix 3. Multiply: correlation_matrix = matrix × transposed 4. Result: Precomputed correlation values for all pairs Tradeoff: O(n²) memory for O(1) access during comparison

Phase 6: Similarity Detection & Dictionary Building

For i = 1 to num_windows-1: For j = i+1 to num_windows: 1. Get features from arrays: f1_i, f2_i, f1_j, f2_j 2. Early break if |f1_j - f1_i| > max_acceptable_diff 3. Calculate distance based on metric: • Euclidean: √[(f1_i-f1_j)² + (f2_i-f2_j)²] • Correlation: 1 - precomputed_value(i,j) • Cosine: 1 - min(|f1_i|,|f1_j|)/max(|f1_i|,|f1_j|) 4. Normalize to similarity: 1 - (distance/max_distance) 5. If similarity ≥ threshold: Add to dictionary: (window_i, window_j, distance) Report statistics: • Comparisons made vs skipped • Speedup factor • Dictionary size

Phase 7: Variation Generation

num_output_windows = floor(output_duration / window_size) For out_i = 1 to num_output_windows: 1. Select source window: • If dictionary not empty and randomize_order: random pair from dictionary • Else: sequential from original 2. Extract audio segment at window's time range 3. Apply variation_method: • Method 1-6 as described in Variations section 4. Store varied segment ID Progress reporting every 10 windows

Phase 8: Recomposition & Output

1. Concatenate all varied segments 2. Rename: originalname + "_LZ_variation" 3. Trim to exact output_duration if longer 4. Scale peak to 0.99 (normalize) 5. Display final statistics: • Output duration • Dictionary size • Performance metrics 6. Auto-play if play_output enabled 7. Clean up temporary objects

Information Window Output

TYPICAL OUTPUT: === LZ-Inspired Audio Analysis (OPTIMIZED) === Analysis type: Pitch Window size: 0.1 s Overlap: 50% Number of windows: 199 Analyzing... Extracting pitch contours... Loading features into memory... Sorting features for efficient comparison... Building similarity dictionary... Found 542 similar pattern pairs Comparisons made: 1,542 Comparisons skipped: 18,159 Speedup factor: 12.78x Creating variations... Processing window 10/100 Processing window 20/100 ... Processing window 100/100 Concatenating segments... === Complete === Output duration: 10.0 seconds Dictionary size: 542 similar pairs Output sound created: originalname_LZ_variation Playing output...

Troubleshooting Common Issues

Problem: Script runs very slowly
Causes: Too many windows (small window_size), high overlap, large file
Solutions: • Increase window_size (0.2s instead of 0.05s)
• Reduce overlap (0.3 instead of 0.7)
• Use shorter input file
• Check speedup factor in output — if < 2x, sort-and-sweep not helping

Problem: No patterns found (dictionary size = 0)
Causes: Similarity_threshold too high, distance metric inappropriate
Solutions: • Lower similarity_threshold (0.6 instead of 0.9)
• Try different distance_metric
• Try different analysis_type
• Check if features are being extracted correctly (undefined values?)

Problem: Output is chaotic/disjointed
Causes: Low similarity_threshold, extreme variation_amount
Solutions: • Increase similarity_threshold for more coherent patterns
• Reduce variation_amount (0.3 instead of 0.8)
• Use less disruptive variation_method (pitch shift instead of granular shuffle)
• Increase window_size for longer, more coherent segments

Problem: Memory error or Praat crash
Causes: Too many windows creating large matrices, memory limits
Solutions: • Reduce number of windows (increase window_size, reduce overlap)
• Use shorter input file
• Avoid correlation metric with very large window counts
• Increase Praat memory allocation in preferences

Problem: Output has clicks/pops between segments
Causes: No crossfade between concatenated segments
Solutions: • Increase overlap parameter (creates overlapping windows)
• Apply crossfade manually after generation
• Use smaller variation_amount for smoother transitions
• This is inherent to segment concatenation approach