PCA Timbre Selector — Feature-Based Sound Segmentation
Analyzes timbre and selects segments using direct feature selection or PCA targeting with full visualisation. Extract bright, dark, noisy, tonal, high/low pitch, loud/quiet, or custom PCA-targeted regions from any sound.
What this does
This script implements timbre-based sound segmentation using PCA and direct feature selection. It analyses a sound frame by frame, extracting 5 acoustic features (pitch, intensity, spectral centroid, spectral spread, HNR). Based on a preset or custom PCA target, it selects frames matching the desired timbre and concatenates them into a new output sound. The visualisation includes PCA scatter plots (PC1 vs PC2, PC1 vs PC3, PC2 vs PC3) with selected/rejected frames colour-coded, eigenvector loadings, and a selection score timeline.
Key Features:
- 8 Direct Feature Presets — Bright, Dark, Noisy, Tonal, High Pitch, Low Pitch, Loud, Quiet
- Custom PCA Targeting — select frames by proximity to (PC1, PC2, PC3) coordinates
- 5 Acoustic Features — Pitch (Hz), Intensity (dB), Spectral Centroid (Hz), Spectral Spread (Hz), HNR (dB)
- PCA Visualisation — 3 scatter plots (PC1/PC2, PC1/PC3, PC2/PC3), eigenvector loadings bar chart
- Selection Timeline — green = selected, grey = rejected, with score curve
- Concatenated Output — selected frames joined with overlap (10 ms crossfade)
- Preserves Stereo — extraction from original stereo sound, not mono analysis copy
Quick start
- In Praat, select exactly one Sound object (mono or stereo).
- Run script… →
PCA_Timbre_Selector.praat. - Choose a preset from the dropdown:
- Bright / Dark — based on spectral centroid
- Noisy / Tonal — based on HNR (harmonics-to-noise ratio)
- High Pitch / Low Pitch — based on fundamental frequency
- Loud / Quiet — based on intensity
- Custom (PCA targeting) — specify target PC1, PC2, PC3
- Set Segment_ms (analysis window, default 25 ms) and Frame_step_seconds (default 0.01 s).
- Set Selection_percentile (e.g., 25 = select top/bottom 25% of frames).
- Click OK — script analyses, selects frames, creates output.
8 Presets + Custom PCA
5 Acoustic Features
Pitch (F0)
Fundamental frequency in Hz, extracted via autocorrelation (range 75–600 Hz). Unvoiced frames = 0 Hz.
High/Low Pitch presets
Intensity
Root-mean-square amplitude converted to dB SPL-like scale.
Loud/Quiet presets
Spectral Centroid
Centre of gravity of the spectrum (brightness). Higher = brighter, lower = darker.
Bright/Dark presets
Spectral Spread
Standard deviation of the spectrum around the centroid (bandwidth).
Used in PCA (not directly selectable)
HNR (Harmonics-to-Noise Ratio)
Periodicity measure. High = clear pitch (tonal), low = noisy/unvoiced.
Noisy/Tonal presets
- Analysis window: Segment_ms (default 25 ms) with Gaussian shape
- Frame step: default 0.01 s (100 frames/second)
- Pitch: To Pitch (frame_step, f0_min=75, f0_max=600)
- Intensity: To Intensity (minimum pitch for voiced threshold = 75 Hz)
- Spectral features: To Spectrogram → Spectrum (slice) → centroid + spread
- HNR: To Harmonicity (ac) (frame_step, f0_min, silence threshold, periodicity floor)
PCA Space & Loadings
Standardisation before PCA (v1.3 fix)
Each feature column is z-score standardised: x' = (x - μ)/σ.
This prevents Hz-scale features (centroid, pitch) from dominating the PCA because of their larger numerical range.
Principal components
The script computes PCA on the 5 standardised features, then projects each analysis frame into PC space (3 components). The visualisation shows:
- PC1 vs PC2 — typically captures brightness vs. pitch/timbre
- PC1 vs PC3 — may capture noisiness vs. intensity
- PC2 vs PC3 — higher-order timbral variation
- PC1 may have high positive loading for Centroid (brightness) and negative loading for HNR (noisiness)
- PC2 may be dominated by Pitch
- PC3 may capture Spread or Intensity
Applications
Sound cleaning / noise reduction
Use case: Remove noisy/unvoiced sections from a recording, leaving only tonal regions.
Settings: Tonal preset, Selection_percentile = 40–60. Output contains only pitched frames (voice, instrument) — breath, clicks, and noise removed.
Timbre-based collage / sampling
Use case: Extract all bright sibilant moments from a recording for a stutter or glitch effect.
Settings: Bright preset, Selection_percentile = 20. Output is a concatenation of all high-centroid frames (e.g., "s", "sh", cymbal hits).
Isolate bass / low-frequency content
Use case: Extract low-pitched sections from a mixed recording.
Settings: Low Pitch preset, Selection_percentile = 25. Output contains only frames where pitch is in the lowest quartile.
Exploratory / compositional PCA selection
Use case: Find all frames with timbre similar to a specific region.
Settings: Custom (PCA). First run with a preset to see the PCA scatter plot, then manually pick target coordinates from a frame of interest. Rerun to select frames near that point in timbre space.
Workflow: Vocal → Sibilance-only phrase
Source: Spoken word recording.
Settings: Bright preset, Selection_percentile = 20.
Result: The output contains only the "s", "sh", "f", "th" sounds — a sibilance stutter effect.
Workflow: Drum loop → Noise-only rhythmic texture
Source: Drum loop (kick, snare, hi-hat, room).
Settings: Noisy preset, Selection_percentile = 30.
Result: Only the snare's noise tail, hi-hat hiss, and room ambience are kept — a breathy, granular texture with the same rhythm.
Workflow: Orchestra → Low strings excerpt
Source: Full orchestra recording.
Settings: Low Pitch preset, Selection_percentile = 25.
Result: Only frames where the dominant pitch is in the lowest quartile (bass, cello, contrabass) — a "low-only" orchestral reduction.
• Too few frames selected: Increase Selection_percentile (e.g., 25 → 40). If using PCA mode, increase distance threshold by increasing percentile.
• Output has clicks at segment joins: The script concatenates with overlap (10 ms crossfade). If clicks persist, increase overlap or smooth manually after generation.
• Pitch tracking fails / no tonal frames detected: Adjust F0_min and F0_max in the form to match your source (e.g., for bass, set F0_min=30, F0_max=200). Use the analysis info to see mean pitch.
• PCA scatter plot shows points outside expected range: Standardisation (z-score) centres features at 0. PC scores are typically in [-3, 3] range. That's normal.
• Bright/Dark selection includes noisy frames: Centroid only measures brightness, not periodicity. Use Tonal preset first to isolate pitched frames, then Bright/Dark on the result.
Visualisation (Suite 8×8)
- Title bar — script name, sound name, preset, selection %, segment count
- Original waveform (grey) — full source sound
- Output waveform (green) — concatenated selected frames
- Selection timeline — green = selected, grey = rejected, over time axis
- PCA scatter plots — PC1/PC2, PC1/PC3, PC2/PC3, with green dots = selected, grey = rejected, red cross = PCA target (Custom mode)
- Eigenvector loadings — bar chart showing feature contributions to PC1/PC2/PC3
- Selection score over time — distance (PCA mode) or |z-score| (direct mode)
- Legend — variance explained percentages, colour key