Audio Descriptions Analysis — User Guide

Comprehensive batch audio feature extraction: pitch, intensity, spectral characteristics, voice quality metrics, and perceptual descriptors for multiple sounds simultaneously.

Author: Shai Cohen Affiliation: Department of Music, Bar-Ilan University, Israel Version: 0.1 (2025) License: MIT License Repo: https://github.com/ShaiCohen-ops/Praat-plugin_AudioTools

Contents:

What this does Quick start Feature Descriptions Interpreting Results Applications Exporting Data

What this does

This script performs comprehensive audio feature extraction on multiple sound files simultaneously, computing 19 quantitative descriptors covering pitch statistics, intensity dynamics, voice quality metrics (jitter, shimmer, harmonicity), and advanced spectral characteristics (centroid, spread, rolloff, flatness, roughness). Output: structured table with one row per sound file, ready for statistical analysis, machine learning, or comparative studies. Perfect for: corpus analysis, instrument classification, voice quality assessment, sound design comparison, experimental composition research. Automated batch processing — analyze hundreds of files in minutes without manual measurement.

Key Features:

Batch Processing — Analyzes ALL sounds in Objects window automatically
19 Features — Duration, 6 pitch metrics, 4 intensity metrics, 3 voice quality, 6 spectral
Robust Error Handling — Handles undefined values, unvoiced sounds, edge cases
CSV Export — Table format ready for Excel, R, Python, SPSS
No Parameters — Automatic processing with scientifically optimal defaults
Research-Ready — Metrics used in speech science, MIR, acoustics research

What kinds of analysis? This script extracts descriptive features — quantitative measurements characterizing audio content. Categories: (1) Temporal: duration. (2) Pitch/F0: fundamental frequency statistics (mean, range, variability). (3) Intensity/Loudness: amplitude envelope characteristics. (4) Voice quality: jitter (pitch perturbation), shimmer (amplitude perturbation), harmonicity (signal-to-noise). (5) Spectral shape: centroid (brightness), spread (bandwidth), rolloff (energy concentration). (6) Spectral texture: flatness (tonality), roughness (irregularity). (7) Singing-specific: SPR (singer's formant). Applications: instrument recognition, emotion detection, speaker identification, voice pathology, music information retrieval, sound design categorization.

Technical Implementation: (1) Auto-select all Sound objects in workspace, (2) Create empty table with 19 column headers, (3) For each sound: extract pitch (autocorrelation 75-600Hz), calculate pitch statistics (mean, min, max, median, stdev), extract intensity (100Hz smoothing), calculate intensity statistics (max, min, median, variance), create point process for periodicity detection, calculate jitter and shimmer (local perturbation), calculate harmonicity (cepstral method), perform spectral analysis (FFT), calculate SPR (singing power ratio 50-2000Hz vs 2000-4000Hz), calculate spectral centroid and spread (weighted frequency distribution), calculate spectral rolloff (85% energy threshold), calculate spectral flatness (geometric/arithmetic mean ratio), calculate spectral roughness (local irregularity), (4) Fill table row with all metrics, (5) Export to CSV for external analysis. Robust: all undefined values set to zero, handles unvoiced sounds, checks for sufficient data before calculations.

Quick start

Load multiple Sound files into Praat Objects window (Open → Read from file...).
Run script… → descriptions.praat.
No dialog — script automatically analyzes all sounds.
Processing displays progress in Info window.
Result: Table Results appears in Objects window with all metrics.
Export: Select table → Save as comma-separated file... → save as .csv.

Quick tip: Works on any sound type (voice, instruments, noise, synthesis). Batch mode = select 50+ files at once, run script, get complete analysis table. Processing time: ~2-5 seconds per file (depends on duration). Info window shows progress ("Processing: filename"). Table automatically selected when complete. Immediately save as CSV to preserve results. Metrics robust to different sample rates, bit depths, mono/stereo (stereo converted to mono for analysis).

Important: Script analyzes EVERY Sound object in Objects window — remove any sounds you don't want analyzed before running. No parameters to adjust (uses research-standard defaults). Some metrics return 0 for inappropriate signals: pitch metrics for unpitched sounds (noise, percussion), jitter/shimmer for unvoiced speech, harmonicity for inharmonic sounds. This is normal — zero indicates feature not applicable. For voiced/pitched material only, filter out zeros during analysis. Very short sounds (<0.1s) may produce unreliable pitch/intensity statistics.

Feature Descriptions

Complete Feature Set (19 Metrics)

📊 Extracted Features

Temporal: 1 feature

Pitch: 6 features

Intensity: 4 features

Voice Quality: 3 features

Spectral: 5 features

1. Temporal Features

Duration_s

Unit: Seconds

Description: Total duration of sound file

Calculation: End time - start time

Interpretation:

Simple temporal measurement
Useful for normalizing other metrics
Important for rhythm/timing studies

2. Pitch Features (F0 Statistics)

Pitch_mean_Hz

Unit: Hertz (Hz)

Description: Average fundamental frequency across voiced portions

Method: Autocorrelation (raw cc), 75-600 Hz range

Interpretation:

Speech: ~120 Hz (male), ~220 Hz (female), ~300 Hz (child)
Singing: Varies by register (bass ~100 Hz, soprano ~400 Hz)
Instruments: Depends on tuning (A4 = 440 Hz standard)
Zero: Indicates unpitched sound (noise, percussion, silence)

Pitch_min_Hz & Pitch_max_Hz

Unit: Hertz (Hz)

Description: Lowest and highest detected fundamental frequencies

Interpretation:

Range = max - min: Pitch variability
Large range: expressive speech, melodic singing, vibrato
Small range: monotone speech, steady instruments
Useful for: tessitura analysis, vocal range studies

Pitch_median_Hz

Unit: Hertz (Hz)

Description: 50th percentile (middle value) of pitch distribution

Advantage over mean: Robust to outliers (octave errors, glitches)

Interpretation:

More reliable than mean for noisy pitch tracks
Compare to mean: large difference indicates skewed distribution or errors

Pitch_stdev_Hz

Unit: Hertz (Hz)

Description: Standard deviation of fundamental frequency

Interpretation:

Low stdev (<10 Hz): Stable pitch (monotone, sustained tone)
Medium stdev (10-50 Hz): Moderate variation (expressive speech)
High stdev (>50 Hz): Large pitch range (singing, melodic speech)
Correlates with: Prosodic expressiveness, melodic activity

3. Intensity Features (Loudness Dynamics)

Intensity_max_dB & Intensity_min_dB

Unit: Decibels (dB SPL)

Description: Peak and minimum intensity values

Smoothing: 100 Hz window (removes sample-level fluctuations)

Interpretation:

Dynamic range = max - min: Loudness variation
Large range: high dynamics (classical music, expressive speech)
Small range: compressed (pop music, broadcast speech)
Typical speech: 20-30 dB range

Intensity_median_dB

Unit: Decibels (dB SPL)

Description: Median intensity (50th percentile)

Interpretation:

Representative "average loudness"
More robust than mean for signals with silence/noise

Intensity_variance_dB

Unit: dB² (squared decibels)

Description: Variance of intensity (stdev²)

Interpretation:

Measures spread of loudness distribution
High variance: uneven dynamics, high contrast
Low variance: steady loudness, compressed signal

4. Voice Quality Features (Perturbation Metrics)

Jitter_local

Unit: Ratio (dimensionless)

Description: Average absolute difference between consecutive pitch periods

Formula: Jitter = (1/N) Σ |T(i) - T(i-1)| / mean(T)

Interpretation:

Healthy voice: <1% (0.01)
Rough/hoarse voice: >1.5% (0.015)
Pathological: >3% (0.03)
Clinical use: Voice disorder diagnosis
Music: Expressive techniques (vibrato has high jitter)

Shimmer_local

Unit: Ratio (dimensionless)

Description: Average absolute difference between consecutive peak amplitudes

Formula: Shimmer = (1/N) Σ |A(i) - A(i-1)| / mean(A)

Interpretation:

Healthy voice: <3% (0.03)
Breathy voice: >5% (0.05)
Pathological: >10% (0.1)
Correlates with: Breathiness, irregular glottal closure

Harmonicity_dB (HNR)

Unit: Decibels (dB)

Description: Harmonics-to-Noise Ratio — signal periodicity measure

Method: Cepstral analysis

Interpretation:

>20 dB: Clear, periodic signal (healthy voice, musical instrument)
10-20 dB: Moderate noise (normal conversational speech)
<10 dB: Noisy signal (breathy voice, distorted sound)
Negative: More noise than harmonic content
Clinical: Voice pathology indicator
Music: Distinguishes clean vs distorted tones

5. Spectral Features (Frequency Domain)

SPR_dB (Singing Power Ratio)

Unit: Decibels (dB)

Description: Difference between low band (50-2000 Hz) and high band (2000-4000 Hz) maxima

Formula: SPR = max(50-2000Hz) - max(2000-4000Hz)

Interpretation:

Positive SPR: More low-frequency energy (typical speech, bass instruments)
Near-zero SPR: Balanced spectrum
Negative SPR: More high-frequency energy (cymbals, sibilants, brightness)
Singing application: Trained singers develop "singer's formant" around 2500-3500 Hz → higher SPR
Acoustic projection: High SPR = better audibility over orchestra

Spectral_centroid_Hz

Unit: Hertz (Hz)

Description: Center of gravity of spectrum — frequency-weighted mean

Formula: Centroid = Σ(f × magnitude) / Σ(magnitude)

Interpretation:

Perceptual correlate: Brightness, sharpness
Low centroid (<1000 Hz): Dark, warm, muffled (cello, male voice)
Mid centroid (1000-3000 Hz): Balanced (piano, female voice)
High centroid (>3000 Hz): Bright, sharp, harsh (cymbals, sibilants)
MIR application: Instrument classification, timbre analysis

Spectral_spread_Hz

Unit: Hertz (Hz)

Description: Standard deviation of spectrum around centroid — bandwidth measure

Formula: Spread = √[Σ((f - centroid)² × magnitude) / Σ(magnitude)]

Interpretation:

Low spread (<500 Hz): Narrow bandwidth (sine wave, whistle, flute)
Mid spread (500-1500 Hz): Moderate bandwidth (voice, most instruments)
High spread (>1500 Hz): Wide bandwidth (noise, crash cymbals)
Combination with centroid: High centroid + low spread = pure high tone; Low centroid + high spread = rich low tone

Spectral_rolloff_Hz

Unit: Hertz (Hz)

Description: Frequency below which 85% of spectral energy is contained

Calculation: Cumulative energy threshold

Interpretation:

Low rolloff (<2000 Hz): Energy concentrated in bass/mids (bass, kick drum)
Mid rolloff (2000-5000 Hz): Balanced energy distribution (voice, guitar)
High rolloff (>5000 Hz): Significant high-frequency content (cymbals, hi-hats)
Distinguishes: Dull vs bright sounds independently of centroid
MIR use: Genre classification (metal has higher rolloff than jazz)

Spectral_flatness

Unit: Ratio 0-1 (dimensionless)

Description: Ratio of geometric mean to arithmetic mean of spectrum (80-5000 Hz)

Formula: Flatness = exp(mean(ln(power))) / mean(power)

Interpretation:

0 (tonal): Pure tones, harmonics (sine wave, sustained notes)
~0.3 (mixed): Combination of tones and noise (voiced fricatives, bowed strings)
~0.7-1.0 (noisy): White noise, unvoiced speech, crash cymbals
Perceptual: Tonality vs noisiness
Speech: Vowels ~0.1, fricatives ~0.6-0.9

Spectral_roughness

Unit: Arbitrary (relative measure)

Description: Average absolute difference between adjacent spectral bins — local irregularity

Formula: Roughness = mean(|magnitude(f) - mean(magnitude(f±1))|)

Interpretation:

Low roughness: Smooth spectrum (pure tones, low-pass filtered)
High roughness: Irregular spectrum (distortion, inharmonicity, complex textures)
Perceptual: Corresponds to auditory roughness/harshness
Not directly comparable across sample rates (bin-dependent)
Use: Within-corpus comparisons, roughness trends

Feature Summary Table

Feature	Unit	Low Values	High Values
Duration_s	seconds	Short sounds	Long sounds
Pitch_mean_Hz	Hz	Low pitch (bass)	High pitch (treble)
Pitch_stdev_Hz	Hz	Monotone	Melodic variation
Intensity range	dB	Compressed dynamics	High dynamics
Jitter	ratio	Stable pitch	Irregular/rough
Shimmer	ratio	Stable amplitude	Breathy/irregular
Harmonicity	dB	Noisy	Clear/periodic
SPR	dB	Bright/high energy	Dark/low energy
Spectral_centroid	Hz	Dark timbre	Bright timbre
Spectral_spread	Hz	Narrow bandwidth	Wide bandwidth
Spectral_rolloff	Hz	Bass-heavy	Treble-rich
Spectral_flatness	0-1	Tonal	Noisy
Spectral_roughness	relative	Smooth spectrum	Irregular spectrum

Interpreting Results

Reading the Results Table

Table structure:

Each row = one sound file
Column 1: SoundName (filename)
Columns 2-20: 19 numerical features
Values formatted as numbers (not scientific notation)

Zero Values: What They Mean

Common reasons for zero values: Pitch features = 0: → Sound is unpitched (noise, percussion, silence) → No fundamental frequency detected → Normal for: drums, cymbals, breath, room tone Jitter/Shimmer = 0: → Not enough periodic cycles detected → Fewer than 2 pitch periods in sound → Normal for: very short sounds, unvoiced speech Harmonicity = 0 or negative: → Undefined value handled as zero → Or genuinely more noise than signal → Check if sound is noisy/distorted Spectral features ≠ 0: → All sounds have spectral content → If zero, indicates calculation error (rare)

Typical Value Ranges by Sound Type

Male Speech

Pitch_mean_Hz: 100-150
Pitch_stdev_Hz: 10-30
Intensity_max_dB: 60-80
Jitter: 0.005-0.015
Shimmer: 0.02-0.05
Harmonicity_dB: 10-20
Spectral_centroid_Hz: 800-1500
Spectral_flatness: 0.1-0.3

Female Speech

Pitch_mean_Hz: 180-250
Pitch_stdev_Hz: 15-40
Intensity_max_dB: 60-80
Jitter: 0.005-0.015
Shimmer: 0.02-0.05
Harmonicity_dB: 10-20
Spectral_centroid_Hz: 1000-2000
Spectral_flatness: 0.1-0.3

Singing Voice (Trained)

Pitch_mean_Hz: 200-600 (varies by voice type)
Pitch_stdev_Hz: 30-100 (wide melodic range)
Jitter: 0.003-0.01 (lower than speech)
Shimmer: 0.01-0.03 (controlled)
Harmonicity_dB: 15-25 (clearer than speech)
SPR_dB: Higher (singer's formant)
Spectral_centroid_Hz: 1500-3000

What this does

Quick start

Feature Descriptions

Complete Feature Set (19 Metrics)

📊 Extracted Features

1. Temporal Features

Duration_s

2. Pitch Features (F0 Statistics)

Pitch_mean_Hz

Pitch_min_Hz & Pitch_max_Hz

Pitch_median_Hz

Pitch_stdev_Hz

3. Intensity Features (Loudness Dynamics)

Intensity_max_dB & Intensity_min_dB

Intensity_median_dB

Intensity_variance_dB

4. Voice Quality Features (Perturbation Metrics)

Jitter_local

Shimmer_local

Harmonicity_dB (HNR)

5. Spectral Features (Frequency Domain)

SPR_dB (Singing Power Ratio)

Spectral_centroid_Hz

Spectral_spread_Hz

Spectral_rolloff_Hz

Spectral_flatness

Spectral_roughness

Feature Summary Table

Interpreting Results

Reading the Results Table

Zero Values: What They Mean

Typical Value Ranges by Sound Type

Male Speech

Female Speech

Singing Voice (Trained)

Acoustic Guitar