Temporal Elasticity — Latent Time Warping Engine — User Guide

Segments a Sound into events, extracts acoustic features, and calls a Python engine that learns a latent space and builds a temporal field to warp event durations. Reconstruction is via PSOLA, resampling, or placement.

Author: Shai Cohen Affiliation: Department of Music, Bar-Ilan University, Israel Version: 1.0 (2025) License: MIT License Citation: Cohen, S. (2025). Praat AudioTools Repo: https://github.com/ShaiCohen-ops/Praat-plugin_AudioTools
Contents:

What this does

This script implements a Temporal Elasticity engine — a latent time warping system that segments audio into events, extracts acoustic features, learns a latent space, builds a temporal field, and warps event durations based on their position in that space. Reconstruction is performed via PSOLA, resampling, or simple placement.

⏱️ What is Temporal Elasticity?

This approach treats time as an elastic medium that can stretch or compress depending on acoustic properties:

  • Events are segmented from the source (silence-based)
  • Feature extraction (24-dimensional): MFCCs, spectral centroid, flatness, RMS, ZCR, duration, delta-MFCCs
  • Latent space learned via autoencoder or PCA
  • Temporal field built over latent space — 5 modes: gravitational, inversion, turbulence, gradient, relativistic
  • Duration rules apply additional transformations based on event characteristics
  • Reconstruction via PSOLA (pitch-preserving), resampling, or placement

Key Features:

Technical Implementation: (1) Event Segmentation: Silence-based detection. (2) Feature Extraction: 24-dim vector per event (MFCC, centroid, flatness, RMS, ZCR, duration, delta-MFCCs). (3) Latent Learning: Autoencoder (numpy) or PCA. (4) Temporal Field: Maps latent position → time scale factor. (5) Duration Rules: Apply additional modifications. (6) Reconstruction: PSOLA, resampling, or placement.

Quick start

  1. In Praat, select exactly one Sound object (any duration, any content).
  2. Run script… → select TemporalElasticity.praat.
  3. Choose Preset (2-9 for specific strategies, 1 for custom).
  4. Set segmentation parameters (silence threshold, min event duration).
  5. Set latent space parameters (dimensions, iterations, method, seed).
  6. Choose temporal field mode and adjust amplitude, sigma, clusters.
  7. Select extra duration rules and reconstruction method.
  8. Enable Draw_visualization for analysis display.
  9. Click OK — engine segments, extracts features, learns latent space, builds field, reconstructs.
Quick tip: Start with Gentle Warp preset on a 10-20 second recording with varied texture. Enable visualization — you'll see the original and warped waveforms, spectrograms, and a bar chart showing the time scale factor per event (blue=compress, red=stretch). Listen to how events stretch or compress based on their latent position. The output appears as "source_te" in the Objects window.
Important: PYTHON DEPENDENCIES — Requires numpy and soundfile. PSOLA RECONSTRUCTION works best on pitched material; resampling may alter pitch; placement only may create gaps. LATENT DIMENSIONS limited to 2-8 for stability. TEMPORAL FIELD MODES produce different behaviors — experiment to find what works for your material. EXTRA RULES can amplify or counteract the field effects.

Temporal Elasticity Theory

Feature Extraction (24 dimensions)

Feature vector per event: [1-13] MFCC means (13 coefficients) [14] Spectral centroid (normalised) [15] Spectral flatness (Wiener entropy) [16] RMS energy [17] Zero-crossing rate [18] Log duration (samples) [19-24] Delta-MFCCs (first 6, mean absolute difference) All features are normalised before latent learning.

Latent Learning Methods

🧠 Autoencoder

Symmetric autoencoder: input (24) → hidden (h) → latent (z) → hidden (h) → output (24)

h = max(z×2, min(128, √(24×z)))

Training: 20-400 iterations, Adam optimiser, MSE loss

📊 PCA

Principal Component Analysis via SVD. Faster but linear.

Temporal Field Modes

For each event with latent position z: Gravitational: scale = 1 + A × Σ exp(-||z - c_k||² / 2σ²) • Stretch near cluster centers Inversion: density = mean distance to k=3 nearest neighbours scale = median(density) / density^A • Dense regions compress (scale < 1), sparse regions stretch (scale > 1) Turbulence: scale = 1 + Σ a_i × exp(-||z - c_i||² / 2σ²) • Deterministic noise field with seeded random centres/amplitudes Gradient: proj = z · pc1 (first principal component) t = (proj - min) / (max - min) scale = 1/(1+A) + t × (2A) Relativistic: velocity = ||z_i - z_{i-1}|| c = 95th percentile velocity γ = 1 / √(1 - (v/c)²) (Lorentz factor) scale = 1 + A × (γ / median(γ) - 1)

Extra Duration Rules

📏 Additional Modifications

RuleFormulaEffect
short_stretchs × (1 + (0.1 - dur)/0.1) for dur < 0.1sShort events get extra stretch
long_compresss × max(0.5, 1 - 0.3 × (dur-0.5)/0.5) for dur > 0.5sLong events get extra compression
harmonic_compresss × (1 - 0.4 × (1 - flatness))Tonal events (low flatness) compress
noisy_dilates × (1 + 0.6 × flatness)Noisy events (high flatness) stretch

Reconstruction Methods

PSOLA: Pitch-synchronous overlap-add. Preserves pitch while changing duration. Best for pitched material (voice, instruments). Resampling: Change sample rate to achieve new duration, then resample back. Changes pitch proportionally. Simple and fast. Placement only: Events placed at new durations; gaps filled with silence. No pitch/spectral modification.

Preset Strategies

Preset 2: Gentle Warp

🌊 Subtle Time Warping

Latent: 4 | Method: AE | Mode: gravitational

Amplitude: 0.3 | Sigma: 1.5 | Clusters: 3

Rules: none | Recon: PSOLA

Character: Gentle gravitational pull, subtle duration changes

Use on: Ambient, gradual transformations

Preset 3: Rhythmic Stretch

🥁 Pulse-Oriented

Latent: 4 | Method: PCA | Mode: gradient

Amplitude: 0.6 | Rules: short_stretch | Recon: resampling

Character: Linear gradient along principal axis, short events stretched

Use on: Rhythmic material, loops

Preset 4: Gravitational Pull

🪐 Strong Attraction

Latent: 6 | Method: AE | Mode: gravitational

Amplitude: 1.2 | Sigma: 0.8 | Clusters: 4

Rules: none | Recon: PSOLA

Character: Strong gravitational wells, significant stretch near cluster centers

Use on: Transformative material, creating emphasis

Preset 5: Turbulent Scatter

🌪️ Chaotic Fluctuations

Latent: 8 | Method: AE | Mode: turbulence

Amplitude: 1.0 | Sigma: 0.6 | Clusters: 5

Rules: noisy_dilate | Recon: placement

Character: Stochastic field, noisy events dilated

Use on: Glitch, chaotic textures

Preset 6: Time Inversion

🔄 Density Inversion

Latent: 4 | Method: AE | Mode: inversion

Amplitude: 1.0 | Rules: long_compress | Recon: PSOLA

Character: Dense regions compress, sparse stretch — inverts density

Use on: Dramatic temporal restructuring

Preset 7: Spectral Drift

🎨 Spectral-Based

Latent: 6 | Method: PCA | Mode: gradient

Amplitude: 0.7 | Sigma: 1.2 | Rules: harmonic_compress

Recon: resampling

Character: Gradient along spectral axis, harmonic events compress

Use on: Spectral transformations

Preset 8: Deep Mutation

🧬 Strong Transformation

Latent: 12 | Method: AE | Mode: gravitational

Amplitude: 1.8 | Sigma: 0.5 | Clusters: 6

Rules: none | Recon: PSOLA

Character: Large latent, high amplitude, strong transformation

Use on: Radical time warping

Preset 9: Relativistic

⚡ Velocity-Based

Latent: 6 | Method: AE | Mode: relativistic

Amplitude: 1.2 | Rules: none | Recon: PSOLA

Character: Lorentz time dilation based on latent velocity

Use on: Where change of identity should slow time

Parameters & Controls

Segmentation Parameters

ParameterDefaultDescription
Silence_threshold (dB)25.0dB below which is considered silence
Min_event_duration (s)0.05Minimum event length
Min_silence_duration (s)0.03Minimum silence length for segmentation

Latent Space Parameters

ParameterDefaultDescription
Latent_dimensions4Latent dimensions (2–8)
Training_iterations120AE training iterations (20–400)
Latent_methodaeae (autoencoder) or pca
Random_seed42Seed for reproducibility

Temporal Field Parameters

ParameterDefaultDescription
Field_modegravitationalgravitational, inversion, turbulence, gradient, relativistic
Amplitude0.8Field strength (0.1–3.0)
Sigma1.0Well width for gravitational/turbulence (0.1–5.0)
Clusters3Number of clusters for gravitational mode (1–8)

Duration Rules

ParameterDefaultDescription
Extra_rulesnonenone, short_stretch, long_compress, harmonic_compress, noisy_dilate

Reconstruction

ParameterDefaultDescription
Reconstruction_methodPSOLAPSOLA, resampling, placement_only

Output

ParameterDefaultDescription
Draw_visualization1Generate 5-panel analysis display
Play_result1Audition after processing

Visualization & Analysis

5-Panel Display

Temporal Elasticity Visualization: Panel 1: TITLE • Script name, source name, preset, field mode, latent method, dimensions Panel 2: ORIGINAL WAVEFORM • Gray waveform • Title: "Original" • Duration displayed Panel 3: WARPED WAVEFORM • Green waveform • Title: "Warped" • X-axis: Time (s) • New duration and compression ratio displayed Panel 4: ORIGINAL SPECTROGRAM • 0-5000 Hz spectrogram of original • Title: "Original spectrogram" Panel 5: WARPED SPECTROGRAM • 0-5000 Hz spectrogram of warped output • Title: "Warped spectrogram" Panel 6: SCALE PER EVENT BAR CHART • X-axis: Event index (1 to N) • Y-axis: Time scale factor • Colored bars: - Blue → compress (scale < 1) - Red → stretch (scale > 1) • Red line at scale = 1.0 • Title: "Scale per event (blue=compress · red=stretch · line=1.0)" Panel 7: SUMMARY PANEL • Events, latent method/dimensions, loss • Field mode, rules, scale range/mean • Duration in/out, compression ratio, entropy

Reading the Scale Bar Chart

What the colors mean:
  • Blue bars (scale < 1): Event is compressed — shorter than original
  • Red bars (scale > 1): Event is stretched — longer than original
  • Red line at 1.0: No change
  • The height shows the exact scale factor (e.g., 1.5 = 50% longer)
  • Distribution reveals the temporal field's effect

Interpreting Metrics

What the numbers mean:
  • Scale entropy: Higher = more varied stretching; lower = uniform
  • Compression ratio: Total output duration / input duration
  • Scale mean: Average stretch factor (should be near 1 if total duration preserved)
  • Latent dispersion: How spread out events are in latent space

Applications

Electroacoustic Composition

Use case: Creating time-warped textures where duration reflects acoustic properties

Technique: Gravitational or Inversion presets on varied source material

Workflow:

Rhythmic Manipulation

Use case: Creating rhythmic variations where duration depends on spectral characteristics

Technique: Rhythmic Stretch or Turbulent Scatter on percussive material

Applications:

Sound Design for Media

Use case: Creating evolving textures, risers, transformations

Technique: Deep Mutation or Relativistic presets

Examples:

Research & Education

Use case: Studying relationship between acoustic features and perceived duration

Technique: Compare field modes on same source, examine scale distributions

Learning outcomes:

Practical Workflow Examples

🎬 Film Scene: Time Dilation

Goal: Create 60-second cue where time slows during moments of change

Settings:

  • Source: 30-second ambient with events
  • Preset: Relativistic
  • Amplitude=1.5, PSOLA reconstruction

Result: Events with high latent velocity (rapid change) are stretched — time slows

🎚️ Electronic Music: Glitch Drums

Goal: Create glitchy drum loop

Settings:

  • Source: 8-second drum loop
  • Preset: Turbulent Scatter
  • Rules: noisy_dilate, placement reconstruction

Result: Noisy hits dilated, placed with gaps — glitchy texture

🎙️ Voice Processing: Expressive Speech

Goal: Stretch expressive moments in speech

Settings:

  • Source: 10-second spoken phrase
  • Preset: Gravitational Pull
  • Rules: harmonic_compress (tonal parts compress)

Result: Tonal vowels compress, expressive consonants stretch

Troubleshooting Common Issues

Problem: Python not found or missing packages
Cause: Python not installed, or packages missing
Solution: Install Python and required packages: pip install numpy soundfile
Problem: Too few events detected
Cause: Silence threshold too high/low, or source has no clear events
Solution: Adjust silence_threshold, min_event_duration; script falls back to 0.25s grid
Problem: PSOLA produces artifacts (metallic sounds)
Cause: Extreme time stretching/compression on noisy material
Solution: Use resampling or placement instead, or reduce amplitude
Problem: Output has clicks
Cause: PSOLA synthesis artifacts or poor concatenation
Solution: Use placement method (adds silence gaps) or adjust PSOLA parameters
Problem: Scale factors all near 1
Cause: Amplitude too low, or field mode not creating variation
Solution: Increase amplitude, try different field mode

Advanced Techniques

Custom field modes:

In build_temporal_field(), add new mode definitions to create custom temporal fields.

Feature engineering:

In extract_features_from_patch(), modify the feature vector to include different descriptors (e.g., MFCC variance, spectral rolloff).

Duration rule chaining:

Modify apply_duration_rules() to apply multiple rules in sequence.

Multi-channel input:

Script converts to mono for analysis; output preserves original channel count. For multichannel, modify reconstruction to handle each channel separately.