Audio Descriptions Analysis — User Guide

Comprehensive batch audio feature extraction: pitch, intensity, spectral characteristics, voice quality metrics, and perceptual descriptors for multiple sounds simultaneously.

Author: Shai Cohen Affiliation: Department of Music, Bar-Ilan University, Israel Version: 0.1 (2025) License: MIT License Repo: https://github.com/ShaiCohen-ops/Praat-plugin_AudioTools
Contents:

What this does

This script performs comprehensive audio feature extraction on multiple sound files simultaneously, computing 19 quantitative descriptors covering pitch statistics, intensity dynamics, voice quality metrics (jitter, shimmer, harmonicity), and advanced spectral characteristics (centroid, spread, rolloff, flatness, roughness). Output: structured table with one row per sound file, ready for statistical analysis, machine learning, or comparative studies. Perfect for: corpus analysis, instrument classification, voice quality assessment, sound design comparison, experimental composition research. Automated batch processing — analyze hundreds of files in minutes without manual measurement.

Key Features:

What kinds of analysis? This script extracts descriptive features — quantitative measurements characterizing audio content. Categories: (1) Temporal: duration. (2) Pitch/F0: fundamental frequency statistics (mean, range, variability). (3) Intensity/Loudness: amplitude envelope characteristics. (4) Voice quality: jitter (pitch perturbation), shimmer (amplitude perturbation), harmonicity (signal-to-noise). (5) Spectral shape: centroid (brightness), spread (bandwidth), rolloff (energy concentration). (6) Spectral texture: flatness (tonality), roughness (irregularity). (7) Singing-specific: SPR (singer's formant). Applications: instrument recognition, emotion detection, speaker identification, voice pathology, music information retrieval, sound design categorization.

Technical Implementation: (1) Auto-select all Sound objects in workspace, (2) Create empty table with 19 column headers, (3) For each sound: extract pitch (autocorrelation 75-600Hz), calculate pitch statistics (mean, min, max, median, stdev), extract intensity (100Hz smoothing), calculate intensity statistics (max, min, median, variance), create point process for periodicity detection, calculate jitter and shimmer (local perturbation), calculate harmonicity (cepstral method), perform spectral analysis (FFT), calculate SPR (singing power ratio 50-2000Hz vs 2000-4000Hz), calculate spectral centroid and spread (weighted frequency distribution), calculate spectral rolloff (85% energy threshold), calculate spectral flatness (geometric/arithmetic mean ratio), calculate spectral roughness (local irregularity), (4) Fill table row with all metrics, (5) Export to CSV for external analysis. Robust: all undefined values set to zero, handles unvoiced sounds, checks for sufficient data before calculations.

Quick start

  1. Load multiple Sound files into Praat Objects window (Open → Read from file...).
  2. Run script…descriptions.praat.
  3. No dialog — script automatically analyzes all sounds.
  4. Processing displays progress in Info window.
  5. Result: Table Results appears in Objects window with all metrics.
  6. Export: Select table → Save as comma-separated file... → save as .csv.
Quick tip: Works on any sound type (voice, instruments, noise, synthesis). Batch mode = select 50+ files at once, run script, get complete analysis table. Processing time: ~2-5 seconds per file (depends on duration). Info window shows progress ("Processing: filename"). Table automatically selected when complete. Immediately save as CSV to preserve results. Metrics robust to different sample rates, bit depths, mono/stereo (stereo converted to mono for analysis).
Important: Script analyzes EVERY Sound object in Objects window — remove any sounds you don't want analyzed before running. No parameters to adjust (uses research-standard defaults). Some metrics return 0 for inappropriate signals: pitch metrics for unpitched sounds (noise, percussion), jitter/shimmer for unvoiced speech, harmonicity for inharmonic sounds. This is normal — zero indicates feature not applicable. For voiced/pitched material only, filter out zeros during analysis. Very short sounds (<0.1s) may produce unreliable pitch/intensity statistics.

Feature Descriptions

Complete Feature Set (19 Metrics)

📊 Extracted Features

Temporal: 1 feature

Pitch: 6 features

Intensity: 4 features

Voice Quality: 3 features

Spectral: 5 features

1. Temporal Features

Duration_s

Unit: Seconds

Description: Total duration of sound file

Calculation: End time - start time

Interpretation:

2. Pitch Features (F0 Statistics)

Pitch_mean_Hz

Unit: Hertz (Hz)

Description: Average fundamental frequency across voiced portions

Method: Autocorrelation (raw cc), 75-600 Hz range

Interpretation:

Pitch_min_Hz & Pitch_max_Hz

Unit: Hertz (Hz)

Description: Lowest and highest detected fundamental frequencies

Interpretation:

Pitch_median_Hz

Unit: Hertz (Hz)

Description: 50th percentile (middle value) of pitch distribution

Advantage over mean: Robust to outliers (octave errors, glitches)

Interpretation:

Pitch_stdev_Hz

Unit: Hertz (Hz)

Description: Standard deviation of fundamental frequency

Interpretation:

3. Intensity Features (Loudness Dynamics)

Intensity_max_dB & Intensity_min_dB

Unit: Decibels (dB SPL)

Description: Peak and minimum intensity values

Smoothing: 100 Hz window (removes sample-level fluctuations)

Interpretation:

Intensity_median_dB

Unit: Decibels (dB SPL)

Description: Median intensity (50th percentile)

Interpretation:

Intensity_variance_dB

Unit: dB² (squared decibels)

Description: Variance of intensity (stdev²)

Interpretation:

4. Voice Quality Features (Perturbation Metrics)

Jitter_local

Unit: Ratio (dimensionless)

Description: Average absolute difference between consecutive pitch periods

Formula: Jitter = (1/N) Σ |T(i) - T(i-1)| / mean(T)

Interpretation:

Shimmer_local

Unit: Ratio (dimensionless)

Description: Average absolute difference between consecutive peak amplitudes

Formula: Shimmer = (1/N) Σ |A(i) - A(i-1)| / mean(A)

Interpretation:

Harmonicity_dB (HNR)

Unit: Decibels (dB)

Description: Harmonics-to-Noise Ratio — signal periodicity measure

Method: Cepstral analysis

Interpretation:

5. Spectral Features (Frequency Domain)

SPR_dB (Singing Power Ratio)

Unit: Decibels (dB)

Description: Difference between low band (50-2000 Hz) and high band (2000-4000 Hz) maxima

Formula: SPR = max(50-2000Hz) - max(2000-4000Hz)

Interpretation:

Spectral_centroid_Hz

Unit: Hertz (Hz)

Description: Center of gravity of spectrum — frequency-weighted mean

Formula: Centroid = Σ(f × magnitude) / Σ(magnitude)

Interpretation:

Spectral_spread_Hz

Unit: Hertz (Hz)

Description: Standard deviation of spectrum around centroid — bandwidth measure

Formula: Spread = √[Σ((f - centroid)² × magnitude) / Σ(magnitude)]

Interpretation:

Spectral_rolloff_Hz

Unit: Hertz (Hz)

Description: Frequency below which 85% of spectral energy is contained

Calculation: Cumulative energy threshold

Interpretation:

Spectral_flatness

Unit: Ratio 0-1 (dimensionless)

Description: Ratio of geometric mean to arithmetic mean of spectrum (80-5000 Hz)

Formula: Flatness = exp(mean(ln(power))) / mean(power)

Interpretation:

Spectral_roughness

Unit: Arbitrary (relative measure)

Description: Average absolute difference between adjacent spectral bins — local irregularity

Formula: Roughness = mean(|magnitude(f) - mean(magnitude(f±1))|)

Interpretation:

Feature Summary Table

FeatureUnitLow ValuesHigh Values
Duration_ssecondsShort soundsLong sounds
Pitch_mean_HzHzLow pitch (bass)High pitch (treble)
Pitch_stdev_HzHzMonotoneMelodic variation
Intensity rangedBCompressed dynamicsHigh dynamics
JitterratioStable pitchIrregular/rough
ShimmerratioStable amplitudeBreathy/irregular
HarmonicitydBNoisyClear/periodic
SPRdBBright/high energyDark/low energy
Spectral_centroidHzDark timbreBright timbre
Spectral_spreadHzNarrow bandwidthWide bandwidth
Spectral_rolloffHzBass-heavyTreble-rich
Spectral_flatness0-1TonalNoisy
Spectral_roughnessrelativeSmooth spectrumIrregular spectrum

Interpreting Results

Reading the Results Table

Table structure:

Zero Values: What They Mean

Common reasons for zero values: Pitch features = 0: → Sound is unpitched (noise, percussion, silence) → No fundamental frequency detected → Normal for: drums, cymbals, breath, room tone Jitter/Shimmer = 0: → Not enough periodic cycles detected → Fewer than 2 pitch periods in sound → Normal for: very short sounds, unvoiced speech Harmonicity = 0 or negative: → Undefined value handled as zero → Or genuinely more noise than signal → Check if sound is noisy/distorted Spectral features ≠ 0: → All sounds have spectral content → If zero, indicates calculation error (rare)

Typical Value Ranges by Sound Type

Male Speech

Pitch_mean_Hz: 100-150
Pitch_stdev_Hz: 10-30
Intensity_max_dB: 60-80
Jitter: 0.005-0.015
Shimmer: 0.02-0.05
Harmonicity_dB: 10-20
Spectral_centroid_Hz: 800-1500
Spectral_flatness: 0.1-0.3

Female Speech

Pitch_mean_Hz: 180-250
Pitch_stdev_Hz: 15-40
Intensity_max_dB: 60-80
Jitter: 0.005-0.015
Shimmer: 0.02-0.05
Harmonicity_dB: 10-20
Spectral_centroid_Hz: 1000-2000
Spectral_flatness: 0.1-0.3

Singing Voice (Trained)

Pitch_mean_Hz: 200-600 (varies by voice type)
Pitch_stdev_Hz: 30-100 (wide melodic range)
Jitter: 0.003-0.01 (lower than speech)
Shimmer: 0.01-0.03 (controlled)
Harmonicity_dB: 15-25 (clearer than speech)
SPR_dB: Higher (singer's formant)
Spectral_centroid_Hz: 1500-3000

Acoustic Guitar