Phonetic Tremolo/Glitch Effect — User Guide
Intelligent audio processing: automatically detects speech sounds and applies targeted effects—tremolo on vowels, glitches on fricatives, attenuation on silence—based on acoustic phonetic analysis.
What this does
This script implements an intelligent phonetic effect processor that automatically analyzes speech sounds and applies different audio effects based on phonetic classification. Using acoustic feature extraction (pitch, intensity, formants, harmonicity), the system identifies vowels, fricatives, silence, and other sounds, then applies targeted processing: tremolo modulation to vowels, time-shift glitches to fricatives, heavy attenuation to silence, and mild amplification to other sounds.
Key Features:
- Automatic Phonetic Classification — Real-time detection of speech sound types
- Targeted Effect Application — Different effects for different sound categories
- Acoustic Feature Analysis — Uses pitch, formants, harmonicity, intensity
- Frame-based Processing — Precise temporal control over effect application
- Debug Output — Shows classification statistics and processing results
Technical Implementation: (1) Feature extraction: Creates Pitch, Intensity, Formant, and Harmonicity objects from the input sound. (2) Frame-based analysis: Processes audio in small time frames (default 10ms). (3) Classification: Uses acoustic thresholds to categorize each frame as vowel, fricative, silence, or other. (4) Targeted processing: Applies tremolo to vowels, glitch effects to fricatives, attenuation to silence, and mild boost to other sounds. (5) Debug output: Provides classification statistics for tuning and analysis.
Quick start
- In Praat, select exactly one Sound object (preferably speech).
- Run script… →
phonetic_tremolo_glitch_effect.praat. - Adjust Effect Parameters:
- tremolo_rate_hz: Speed of vowel modulation (2-15 Hz)
- tremolo_depth: Intensity of tremolo (0.3-0.9)
- shift_amount_seconds: Glitch time shift (0.005-0.03 s)
- Set Feature Extraction parameters for analysis quality.
- Adjust Classification Thresholds to tune sound detection.
- Click OK — effect applied, result named "originalname_glitched".
- Check Info window for classification statistics and debug information.
Phonetic Analysis Theory
Acoustic Feature Extraction
Fundamental Acoustic Properties
The system analyzes four key acoustic features:
Praat Analysis Objects
Feature extraction pipeline:
Time: 0.0s 0.5s 1.0s 1.5s 2.0s
Text: "S" "A" "Y" "SH" "EE"
Pitch: ___ /¯¯¯\ /¯¯\ ___ /¯¯¯\
0 Hz 120 Hz 130 Hz 0 Hz 125 Hz
HNR: ___ /¯¯¯\ /¯¯\ ___ /¯¯¯\
2 dB 18 dB 16 dB 1 dB 20 dB
F1: ___ /¯¯¯\ \___/ ___ \___/
0 Hz 600 Hz 300 Hz 0 Hz 280 Hz
Class: Fric Vowel Vowel Fric Vowel
Effect:Glitch Tremolo Tremolo Glitch Tremolo
Different acoustic features trigger different effects
Frame-Based Processing
Temporal Analysis Windows
The audio is processed in small, overlapping frames:
Why Frame-Based Processing?
Temporal precision: Effects can change rapidly with phonetic context
Smooth transitions: Overlapping frames prevent abrupt effect changes
Feature stability: Acoustic features are more reliable over 10-20ms windows
Computational efficiency: Redundant analysis of every sample
Frame step trade-offs:
Small step (5ms): Higher temporal resolution, more processing time
Large step (20ms): Lower resolution, faster processing, potential artifacts
Complete Processing Pipeline
Sound Classification System
Classification Logic
🎯 Phonetic Classification Algorithm
Class 1: VOWELS (Tremolo Effect)
Class 2: FRICATIVES (Glitch Effect)
Class 4: SILENCE (Attenuation)
Class 3: OTHER SOUNDS (Boost)
Acoustic Thresholds Explained
| Threshold | Typical Range | Default | Purpose | Affects |
|---|---|---|---|---|
| vowel_hnr_threshold | 3-10 dB | 5.0 | Minimum HNR for vowels | Vowel detection sensitivity |
| vowel_f1_min_hz | 200-400 Hz | 300 | Minimum F1 for vowels | Excludes glides, weak vowels |
| fricative_hnr_max | 2-5 dB | 3.0 | Maximum HNR for fricatives | Fricative detection sensitivity |
| silence_intensity_threshold | 40-55 dB | 45 | Maximum intensity for silence | Silence detection threshold |
Effect Parameters
| Parameter | Range | Default | Effect | Musical Result |
|---|---|---|---|---|
| tremolo_rate_hz | 2-15 Hz | 8.0 | Vowel modulation speed | Slow: subtle, Fast: rhythmic |
| tremolo_depth | 0.3-0.9 | 0.7 | Vowel modulation depth | Low: gentle, High: dramatic |
| shift_amount_seconds | 0.005-0.03 s | 0.015 | Fricative time shift | Short: subtle, Long: obvious glitch |
Debugging and Tuning
Interpreting Classification Output
Vowels (Class 1): 30-50% of frames ← Tremolo effect
Fricatives (Class 2): 15-25% of frames ← Glitch effect
Other (Class 3): 20-35% of frames ← Boost effect
Silence (Class 4): 10-20% of frames ← Attenuation
Warning signs:
- Vowels < 20%: vowel_hnr_threshold too high
- Fricatives < 10%: fricative_hnr_max too low
- Silence > 40%: silence_intensity_threshold too low
- Other > 50%: thresholds need adjustment
Tuning for Different Audio Material
For noisy recordings: Increase vowel_hnr_threshold to 6-8, decrease fricative_hnr_max to 2.0-2.5 to reduce false detections.
For whispered speech: Set vowel_f1_min_hz to 200 (weaker vowels), increase silence_intensity_threshold to 50 (quieter overall).
For musical vocals: Use lower tremolo_rate_hz (3-5 Hz) for subtle modulation, reduce shift_amount_seconds for less obvious glitches.
Parameters
Effect Parameters
| Parameter | Type | Range | Default | Description |
|---|---|---|---|---|
| tremolo_rate_hz | positive | 2.0-15.0 | 8.0 | Tremolo speed on vowels (Hz) |
| tremolo_depth | positive | 0.3-0.9 | 0.7 | Tremolo intensity on vowels |
| shift_amount_seconds | positive | 0.005-0.03 | 0.015 | Time shift for fricative glitches |
Feature Extraction
| Parameter | Type | Range | Default | Description |
|---|---|---|---|---|
| frame_step_seconds | positive | 0.005-0.02 | 0.01 | Analysis frame step size |
| max_formant_hz | positive | 4000-8000 | 5500 | Maximum formant frequency for analysis |
Classification Thresholds
| Parameter | Type | Range | Default | Description |
|---|---|---|---|---|
| vowel_hnr_threshold | positive | 3.0-10.0 | 5.0 | Minimum HNR for vowel detection |
| vowel_f1_min_hz | positive | 200-400 | 300 | Minimum F1 frequency for vowels |
| fricative_hnr_max | positive | 2.0-5.0 | 3.0 | Maximum HNR for fricative detection |
| silence_intensity_threshold | positive | 40-55 | 45 | Maximum intensity for silence detection |
Applications
Creative Vocal Processing
Use case: Adding rhythmic interest to vocal tracks
Technique: Use musical tremolo rates synchronized to tempo
Settings:
- tremolo_rate_hz = BPM/60 (one cycle per beat)
- Medium tremolo_depth (0.5-0.7) for clear modulation
- Subtle shift_amount_seconds (0.008-0.012) for gentle glitches
- Default classification thresholds for clean vocals
Result: Vocals with rhythmic pulsation on vowels and subtle texture on consonants
Experimental Sound Design
Use case: Creating glitchy, fragmented vocal textures
Technique: Use extreme settings for dramatic effects
Settings:
- Fast tremolo_rate_hz (10-15 Hz) for intense modulation
- Large shift_amount_seconds (0.02-0.03) for obvious glitches
- High tremolo_depth (0.8-0.9) for dramatic volume swings
- Adjust thresholds to target specific sound types
Result: Highly processed, glitch-art vocal effects
Speech Enhancement and Effects
Use case: Adding character to spoken word
Technique: Use subtle effects with careful threshold tuning
Settings:
- Slow tremolo_rate_hz (3-5 Hz) for gentle movement
- Low tremolo_depth (0.3-0.5) for subtle effect
- Small shift_amount_seconds (0.005-0.01) for texture
- Tune thresholds for specific speaker characteristics
Result: Enhanced speech with added character while maintaining intelligibility
Audio Restoration and Noise Control
Use case: Reducing background noise in speech recordings
Technique: Use aggressive silence detection and attenuation
Settings:
- Increase silence_intensity_threshold to detect more noise as silence
- Use default or lower tremolo settings
- Minimal glitch effects (small shift_amount_seconds)
- Adjust vowel_hnr_threshold higher to avoid detecting noise as vowels
Result: Cleaner speech with reduced background noise during pauses
Practical Workflow Examples
🎤 Vocal Rhythm Effect
Goal: Add tempo-synchronized pulsation to singing
Settings:
- tremolo_rate_hz: 2.0 (120 BPM = 2 Hz)
- tremolo_depth: 0.6
- shift_amount_seconds: 0.01
- All thresholds: default
Result: Singing with gentle beat-synchronized tremolo
🔊 Glitch Vocal Effect
Goal: Create experimental glitch vocals
Settings:
- tremolo_rate_hz: 12.0
- tremolo_depth: 0.85
- shift_amount_seconds: 0.025
- fricative_hnr_max: 2.5 (more fricative detection)
Result: Intensely processed glitch-art vocals
🎙️ Speech Enhancement
Goal: Add subtle character to spoken word
Settings:
- tremolo_rate_hz: 4.0
- tremolo_depth: 0.4
- shift_amount_seconds: 0.008
- vowel_f1_min_hz: 250 (catch weaker vowels)
Result: Enhanced speech with subtle texture
Advanced Techniques
- Apply the effect multiple times with different settings
- First pass: subtle tremolo only
- Second pass: glitch effects only
- Creates complex, evolving textures
- Process only specific phoneme types by adjusting thresholds
- High vowel_hnr_threshold + low vowel_f1_min_hz = only clear vowels
- Low fricative_hnr_max = only strong fricatives
- Create custom phonetic effect profiles
Troubleshooting Common Issues
Cause: Classification thresholds too strict
Solution: Lower vowel_hnr_threshold, raise fricative_hnr_max, adjust vowel_f1_min_hz
Cause: vowel_hnr_threshold too low, detecting noise as vowels
Solution: Increase vowel_hnr_threshold, check debug output
Cause: shift_amount_seconds too large for frame size
Solution: Reduce shift_amount_seconds, increase frame_step_seconds
Cause: frame_step_seconds too large, creating abrupt changes
Solution: Decrease frame_step_seconds for smoother transitions
Technical Deep Dive
Acoustic Phonetics Basis
Phonetic Class Acoustic Signatures
Typical acoustic values for different sound classes:
Praat Analysis Methods
Effect Algorithm Details
Tremolo Implementation
The tremolo effect uses a sine-wave amplitude modulation:
Glitch Effect Implementation
The glitch effect uses time-domain shifting:
Performance Considerations
Computational Complexity
Audio duration: Linear increase with longer files
Frame step: Inverse relationship (smaller step = more frames)
Analysis parameters: Higher max_formant_hz increases formant analysis time
Hardware: CPU speed and available RAM
Typical processing times:
1-minute speech, default settings: 10-30 seconds
5-minute speech, default settings: 1-3 minutes
Very small frame_step (5ms): 2-3× longer processing
Memory Usage
The script creates multiple analysis objects: