PCA Timbre Selector — Feature-Based Sound Segmentation

Analyzes timbre and selects segments using direct feature selection or PCA targeting with full visualisation. Extract bright, dark, noisy, tonal, high/low pitch, loud/quiet, or custom PCA-targeted regions from any sound.

Author: Shai Cohen Affiliation: Department of Music, Bar-Ilan University, Israel Version: 1.3 (2025) License: MIT License Repo: https://github.com/ShaiCohen-ops/Praat-plugin_AudioTools
Contents:

What this does

This script implements timbre-based sound segmentation using PCA and direct feature selection. It analyses a sound frame by frame, extracting 5 acoustic features (pitch, intensity, spectral centroid, spectral spread, HNR). Based on a preset or custom PCA target, it selects frames matching the desired timbre and concatenates them into a new output sound. The visualisation includes PCA scatter plots (PC1 vs PC2, PC1 vs PC3, PC2 vs PC3) with selected/rejected frames colour-coded, eigenvector loadings, and a selection score timeline.

What is PCA Timbre Selection? Principal Component Analysis reduces the 5-dimensional feature space to 2–3 principal components that capture the most variance. You can then select frames near a target point in this PCA space (e.g., "custom" mode), or use direct feature thresholds (Bright, Dark, Noisy, Tonal, High/Low Pitch, Loud/Quiet). The result is a new sound containing only the timbral regions you want — great for isolating specific sonic qualities, creating timbral collages, or cleaning up recordings.

Key Features:

v1.3 improvements: Fixed info banner, standardised feature columns (z-score) before PCA so the timbre space isn't dominated by Hz-scale features, output preserves stereo (chunks extracted from original, not mono analysis copy), subtitle centered, distance-panel label reflects the mode.

Quick start

  1. In Praat, select exactly one Sound object (mono or stereo).
  2. Run script…PCA_Timbre_Selector.praat.
  3. Choose a preset from the dropdown:
    • Bright / Dark — based on spectral centroid
    • Noisy / Tonal — based on HNR (harmonics-to-noise ratio)
    • High Pitch / Low Pitch — based on fundamental frequency
    • Loud / Quiet — based on intensity
    • Custom (PCA targeting) — specify target PC1, PC2, PC3
  4. Set Segment_ms (analysis window, default 25 ms) and Frame_step_seconds (default 0.01 s).
  5. Set Selection_percentile (e.g., 25 = select top/bottom 25% of frames).
  6. Click OK — script analyses, selects frames, creates output.
Quick tip: Use Bright to extract high-frequency-rich sections (e.g., cymbal hits, sibilance). Noisy extracts breath, fricatives, or noise. Tonal isolates pitched sounds (voice, instruments). For exploratory analysis, use Custom (PCA) with target (0,0,0) to select frames near the centroid of the timbre cloud. Enable Draw_visualization to see which frames are selected and how they cluster in PCA space.
Important: The script converts a copy of the sound to mono for analysis (features extracted from mono) but extracts segments from the original stereo sound (preserves channel configuration). Selection_percentile = 25 in Bright mode selects the 25% of frames with highest centroid (brightest). For PCA mode, the percentile controls distance threshold: lower = smaller radius around target, fewer frames selected. If too few frames are selected, increase percentile.

8 Presets + Custom PCA

PresetFeatureDirectionSelection RuleTypical Use BrightCentroid↑ highTop percentile highest centroidCymbals, brass, sibilance, high harmonics DarkCentroid↓ lowBottom percentile lowest centroidBass, cello, low-pitched sounds, rumble NoisyHNR↓ lowBottom percentile lowest HNRUnvoiced, breath, fricatives, noise, percussion TonalHNR↑ highTop percentile highest HNRVoiced speech, sustained instruments, pure tones High PitchPitch↑ highTop percentile highest F0Soprano, high-pitched instruments, harmonics Low PitchPitch↓ lowBottom percentile lowest F0Bass, low-frequency fundamentals LoudIntensity↑ highTop percentile highest dBLoud attacks, peaks, emphasised regions QuietIntensity↓ lowBottom percentile lowest dBSilent intervals, soft passages, background Custom (PCA)PC1, PC2, PC3—Distance to target point in PCA spaceCluster-based selection, exploratory analysis

5 Acoustic Features

Pitch (F0)

Fundamental frequency in Hz, extracted via autocorrelation (range 75–600 Hz). Unvoiced frames = 0 Hz.

High/Low Pitch presets

Intensity

Root-mean-square amplitude converted to dB SPL-like scale.

Loud/Quiet presets

Spectral Centroid

Centre of gravity of the spectrum (brightness). Higher = brighter, lower = darker.

Bright/Dark presets

Spectral Spread

Standard deviation of the spectrum around the centroid (bandwidth).

Used in PCA (not directly selectable)

HNR (Harmonics-to-Noise Ratio)

Periodicity measure. High = clear pitch (tonal), low = noisy/unvoiced.

Noisy/Tonal presets

Feature extraction per frame:
  • Analysis window: Segment_ms (default 25 ms) with Gaussian shape
  • Frame step: default 0.01 s (100 frames/second)
  • Pitch: To Pitch (frame_step, f0_min=75, f0_max=600)
  • Intensity: To Intensity (minimum pitch for voiced threshold = 75 Hz)
  • Spectral features: To Spectrogram → Spectrum (slice) → centroid + spread
  • HNR: To Harmonicity (ac) (frame_step, f0_min, silence threshold, periodicity floor)

PCA Space & Loadings

Standardisation before PCA (v1.3 fix)

Each feature column is z-score standardised: x' = (x - μ)/σ.

This prevents Hz-scale features (centroid, pitch) from dominating the PCA because of their larger numerical range.

Principal components

The script computes PCA on the 5 standardised features, then projects each analysis frame into PC space (3 components). The visualisation shows:

  • PC1 vs PC2 — typically captures brightness vs. pitch/timbre
  • PC1 vs PC3 — may capture noisiness vs. intensity
  • PC2 vs PC3 — higher-order timbral variation
Interpreting PCA loadings: The bar chart in the visualisation shows how each original feature contributes to each principal component. For example:
  • PC1 may have high positive loading for Centroid (brightness) and negative loading for HNR (noisiness)
  • PC2 may be dominated by Pitch
  • PC3 may capture Spread or Intensity
The variance explained percentages (displayed in the legend) tell you how much of the total timbral variation is captured by each PC.
Custom PCA targeting: In Custom mode, you specify a target point (PC1, PC2, PC3). The script computes Euclidean distance from each frame to this target in PCA space, then selects frames within a radius determined by the Selection_percentile (lower percentile = smaller radius). This allows you to select frames with specific timbral characteristics not covered by the direct feature presets — e.g., "bright but not noisy" might be a region in PC space between the Bright and Tonal clusters.

Applications

Sound cleaning / noise reduction

Use case: Remove noisy/unvoiced sections from a recording, leaving only tonal regions.

Settings: Tonal preset, Selection_percentile = 40–60. Output contains only pitched frames (voice, instrument) — breath, clicks, and noise removed.

Timbre-based collage / sampling

Use case: Extract all bright sibilant moments from a recording for a stutter or glitch effect.

Settings: Bright preset, Selection_percentile = 20. Output is a concatenation of all high-centroid frames (e.g., "s", "sh", cymbal hits).

Isolate bass / low-frequency content

Use case: Extract low-pitched sections from a mixed recording.

Settings: Low Pitch preset, Selection_percentile = 25. Output contains only frames where pitch is in the lowest quartile.

Exploratory / compositional PCA selection

Use case: Find all frames with timbre similar to a specific region.

Settings: Custom (PCA). First run with a preset to see the PCA scatter plot, then manually pick target coordinates from a frame of interest. Rerun to select frames near that point in timbre space.

Workflow: Vocal → Sibilance-only phrase

Source: Spoken word recording.
Settings: Bright preset, Selection_percentile = 20.
Result: The output contains only the "s", "sh", "f", "th" sounds — a sibilance stutter effect.

Workflow: Drum loop → Noise-only rhythmic texture

Source: Drum loop (kick, snare, hi-hat, room).
Settings: Noisy preset, Selection_percentile = 30.
Result: Only the snare's noise tail, hi-hat hiss, and room ambience are kept — a breathy, granular texture with the same rhythm.

Workflow: Orchestra → Low strings excerpt

Source: Full orchestra recording.
Settings: Low Pitch preset, Selection_percentile = 25.
Result: Only frames where the dominant pitch is in the lowest quartile (bass, cello, contrabass) — a "low-only" orchestral reduction.

Troubleshooting:
Too few frames selected: Increase Selection_percentile (e.g., 25 → 40). If using PCA mode, increase distance threshold by increasing percentile.
Output has clicks at segment joins: The script concatenates with overlap (10 ms crossfade). If clicks persist, increase overlap or smooth manually after generation.
Pitch tracking fails / no tonal frames detected: Adjust F0_min and F0_max in the form to match your source (e.g., for bass, set F0_min=30, F0_max=200). Use the analysis info to see mean pitch.
PCA scatter plot shows points outside expected range: Standardisation (z-score) centres features at 0. PC scores are typically in [-3, 3] range. That's normal.
Bright/Dark selection includes noisy frames: Centroid only measures brightness, not periodicity. Use Tonal preset first to isolate pitched frames, then Bright/Dark on the result.

Visualisation (Suite 8×8)

When Draw_visualization is enabled, the script generates a Praat picture with:
  • Title bar — script name, sound name, preset, selection %, segment count
  • Original waveform (grey) — full source sound
  • Output waveform (green) — concatenated selected frames
  • Selection timeline — green = selected, grey = rejected, over time axis
  • PCA scatter plots — PC1/PC2, PC1/PC3, PC2/PC3, with green dots = selected, grey = rejected, red cross = PCA target (Custom mode)
  • Eigenvector loadings — bar chart showing feature contributions to PC1/PC2/PC3
  • Selection score over time — distance (PCA mode) or |z-score| (direct mode)
  • Legend — variance explained percentages, colour key
The PCA scatter plots are particularly useful for understanding timbre clustering and verifying that your selection captured the intended region.
```