Breathing Pitch Waves Effect — User Guide

Emotional vocal processing: applies breathing-like pitch modulation with configurable rate, depth, and emotional intensity for expressive vocal transformations.

Author: Shai Cohen Affiliation: Department of Music, Bar-Ilan University, Israel Version: 0.1 (2025) License: MIT License Repo: https://github.com/ShaiCohen-ops/Praat-plugin_AudioTools

Contents:

What this does Quick start Breathing Pitch Theory Emotional Presets Parameters Applications

What this does

This script implements breathing pitch waves — a psychoacoustic vocal processing technique that simulates the natural pitch variations of human breathing. Creates emotionally expressive vocal effects by applying complex, multi-layered pitch modulation to audio signals. Process analyzes source pitch, generates dense pitch curves with breathing-like oscillations, and resynthesizes audio with emotional pitch variations. Result: vocals with breathing characteristics (gentle breath, emotional swell, dramatic breath, panic breathing, subtle tremor, etc.) defined by configurable parameters rather than manual automation.

Key Features:

8 Emotional Presets — From gentle meditation to intense gasping
Multi-Layered Modulation — Fundamental, harmonic, and subharmonic components
Micro-Flutter & Chaos — Realistic vocal imperfections and emotional tremor
Intensity Envelope — Emotional build-up over time
Dense Pitch Sampling — Smooth transitions with 200-2000 control points
Flexible Output — Configurable sample rate and precision

What are breathing pitch waves? Traditional pitch effects: vibrato (regular oscillation), portamento (pitch slides), pitch correction (static tuning). Breathing pitch waves: complex, emotionally-driven pitch modulation simulating human breathing patterns. Advantages: (1) Emotional expression: Creates natural, human-like vocal qualities. (2) Complexity: Multi-layered modulation (fundamental + harmonics + subharmonics). (3) Realism: Includes micro-flutter and chaos for natural imperfections. (4) Configurable: Parameters control rate, depth, and emotional intensity. (5) Psychoacoustic: Based on human breathing/vocal physiology. Use cases: Vocal processing (singing, spoken word), sound design (emotional character voices), experimental music (expressive transformations), film/TV (emotional voice effects), therapeutic applications (breathing awareness).

Technical Implementation: (1) Pitch Analysis: Extract median pitch from source audio using Praat's pitch detection. (2) Dense Curve Generation: Create 200-2000 control points across duration for smooth transitions. (3) Multi-Layered Modulation: Generate breathing waveform with fundamental (sin³), harmonic (sin⁵), and subharmonic (sin²) components. (4) Emotional Components: Add micro-flutter (high-frequency variations), tremor (mid-frequency oscillation), and gasps (sudden pitch rises). (5) Intensity Envelope: Apply emotional build-up over time. (6) Pitch Conversion: Convert semitone shifts to frequency ratios, apply to median pitch. (7) Resynthesis: Replace original pitch with breathing pitch curve using overlap-add synthesis. (8) Output Processing: Optional resampling, cleanup, and result selection.

Quick start

In Praat, select exactly one Sound object (preferably vocal content).
Run script… → breathing_pitch_waves.praat.
Choose preset from dropdown (Gentle Breath, Emotional Swell, etc.) or select "Manual" for custom parameters.
Adjust breath_rate (cycles per second, typically 0.1-0.8 Hz).
Set pitch_depth_semitones (variation range, typically 4-36 semitones).
Configure micro_flutter and emotional_intensity for realism and expression.
Set pitch analysis parameters (time_step, minimum_pitch, maximum_pitch).
Configure output settings (output_sample_rate, resample_precision).
Click OK — breathing pitch effect applied, result named "originalname_breath_result".

Quick tip: Start with presets for immediate results — "Gentle Breath" for subtle expression, "Emotional Swell" for singing enhancement, "Panic Breathing" for intense drama. Use Manual mode for fine control. Lower breath_rate (0.1-0.3) for calm breathing, higher (0.5-0.8) for excited/panicked breathing. Moderate pitch_depth (8-15 semitones) for natural effect, extreme (24-36) for dramatic transformation. Enable play_after_processing to hear result immediately. Processing time depends on audio length and density of pitch points (typically 10-60 seconds).

Important: WORKS BEST WITH VOCALS — designed for monophonic pitched content (singing, speech). Polyphonic or noisy audio may produce unpredictable results. Pitch detection critical — ensure minimum_pitch and maximum_pitch brackets the actual pitch range of your audio. Very high emotional_intensity (>3.0) can create extreme pitch variations that may sound artificial. Very high micro_flutter (>6) can introduce audible artifacts. The effect replaces original pitch completely — original pitch contour is discarded. Enable keep_intermediate_objects for debugging if needed. Result may require normalization if pitch variations cause amplitude changes.

Breathing Pitch Theory

Human Breathing Physiology

Natural Breathing Patterns

Breathing characteristics in vocal expression:

Natural breathing patterns: - Rate: 0.1-0.8 Hz (6-48 breaths per minute) - Depth: Subtle to dramatic pitch variations - Complexity: Multi-layered oscillations - Emotion: Correlated with emotional state Emotional correlations: - Calm: 0.1-0.2 Hz, shallow depth (4-8 semitones) - Normal: 0.2-0.3 Hz, moderate depth (8-15 semitones) - Excited: 0.3-0.5 Hz, deep variations (15-24 semitones) - Panicked: 0.5-0.8 Hz, extreme depth (24-36 semitones)

Why Breathing Pitch Modulation?

Advantages of breathing pitch effects:

Emotional authenticity: Mimics natural human vocal expression
Complexity: More organic than simple vibrato or pitch automation
Expressivity: Conveys specific emotional states through pitch patterns
Versatility: Applicable to various vocal styles and contexts

Vs other pitch effects:

Vibrato: Regular, periodic oscillation (mechanical)
Tremolo: Amplitude modulation (different perceptual effect)
Portamento: Smooth pitch slides between notes
Breathing waves: Complex, emotionally-correlated patterns (organic)

Multi-Layered Modulation

Breathing Waveform Components

Complex breathing waveform construction:

Fundamental component: breath_fundamental = sin(phase)^3 (Primary breathing oscillation) Harmonic component: breath_harmonic = 0.6 * sin(phase * 2)^5 (Higher-frequency breathing harmonics) Subharmonic component: breath_subharmonic = 0.3 * sin(phase * 0.5)^2 (Slower, deeper breathing undertones) Total breathing curve: breath_curve = breath_fundamental + breath_harmonic + breath_subharmonic Where: phase = (time - start_time) * 2 * π * breath_rate (Normalized phase angle for breathing cycle)

Why Multi-Layered Approach?

Psychological and acoustic benefits:

Realism: Natural breathing contains multiple frequency components
Richness: Creates more interesting and varied pitch patterns
Emotional nuance: Different components convey different emotional qualities
Natural imperfection: Avoids mechanical, predictable oscillations

🎭 Emotional Components

Micro-flutter: High-frequency variations (vocal cord instability)

Adds realism and natural vocal imperfections

Emotional tremor: Mid-frequency oscillation (emotional tension)

Conveys vulnerability, excitement, or anxiety

Gasp effects: Sudden pitch rises (surprise, intensity peaks)

Creates dramatic moments and emotional highlights

Phase and Timing

Breathing Cycle Calculation

Phase-based timing system:

Phase calculation: phase = (t - xmin) * 2 * π * breath_rate Where: t = current time (seconds) xmin = audio start time breath_rate = breathing cycles per second (Hz) Example: breath_rate = 0.3 Hz, duration = 10 seconds Total phases = 0.3 * 10 = 3 complete breathing cycles Phase at t=5s: (5-0) * 2 * π * 0.3 = 3π radians Phase components: flutter_phase = phase * 12 (high-frequency flutter) chaos_phase = phase * 23.7 (irregular chaos component) tremor_phase = phase * 7.3 (emotional tremor frequency) gasp_phase = phase * 3 (gasp trigger frequency)

Why Phase-Based System?

Advantages of phase-based modulation:

Phase-based vs time-based:

Phase-based: - Consistent regardless of audio duration - Natural breathing rhythm maintained - Easy to scale and modify - Mathematically elegant

Time-based: - Would require duration-specific calculations - Harder to maintain consistent breathing character - More complex implementation

Phase system ensures breathing character remains consistent across different audio lengths

Emotional Intensity Envelope

Time-Varying Emotional Build-up

Intensity progression algorithm:

Intensity envelope calculation: time_factor = (t - xmin) / duration intensity_envelope = 1 + emotional_intensity * time_factor^1.5 Where: time_factor = normalized time (0 to 1) emotional_intensity = user parameter (0.5 to 4.0) exponent 1.5 = gradual build-up curve Application: total_shift = pitch_depth_semitones * (breath_curve + flutter + tremor + gasp) * intensity_envelope Effect: - Beginning: lower intensity (emotional_intensity * 0^1.5 = 0) - Middle: moderate intensity - End: full intensity (emotional_intensity * 1^1.5 = emotional_intensity)

Why Intensity Envelope?

Purpose of time-varying intensity:

Without intensity envelope: Constant emotional level throughout May sound artificial or monotonous Lacks narrative progression With intensity envelope: Emotional build-up over time Creates sense of progression or narrative More psychologically engaging Mimics natural emotional arcs in performance Typical usage: emotional_intensity = 0.5 → subtle gradual increase emotional_intensity = 2.0 → noticeable emotional arc emotional_intensity = 4.0 → dramatic emotional journey

Complete Processing Pipeline

SETUP: Select Sound object (vocal content recommended) Choose preset or manual parameters Set breath_rate, pitch_depth_semitones, micro_flutter, emotional_intensity PITCH ANALYSIS: Copy original sound → "originalname_breath_tmp" To Manipulation: time_step, minimum_pitch, maximum_pitch Extract median pitch (reference frequency) DENSE PITCH CURVE GENERATION: Create PitchTier with 200-2000 points FOR each point (0 to npoints-1): Calculate phase = (time - start) * 2π * breath_rate Build multi-layer breathing waveform Add micro-flutter, tremor, gasp components Apply intensity envelope Convert to frequency, clamp to min/max range Add point to PitchTier RESYNTHESIS: Replace pitch tier in Manipulation Get resynthesis (overlap-add) Rename: "originalname_breath_result" OUTPUT PROCESSING: Optional resampling to output_sample_rate Cleanup intermediate objects Select final result Optional: Play result OUTPUT: "originalname_breath_result" with breathing pitch effect

Emotional Presets

Preset Overview

🎚️ Emotion-Based Configuration

Philosophy: Each preset captures specific emotional state through breathing patterns

Parameters: Breath rate, pitch depth, flutter, intensity optimized for each emotion

Character: Immediate emotional expression without manual tuning

Best for: Quick results, emotional specificity, production workflow

Built-in presets:

Preset	Breath Rate	Pitch Depth	Micro Flutter	Emotional Intensity	Character
Gentle Breath	0.2 Hz	8 st	1.0	1.2	Subtle, calming, meditative
Emotional Swell	0.25 Hz	15 st	2.0	2.0	Expressive, singing-like, building
Dramatic Breath	0.35 Hz	24 st	5.0	3.0	Theatrical, intense, performance
Panic Breathing	0.8 Hz	36 st	8.0	4.0	Frantic, anxious, extreme
Subtle Tremor	0.15 Hz	6 st	3.0	0.8	Delicate, vulnerable, intimate
Deep Meditation	0.1 Hz	4 st	0.5	0.5	Calm, centered, minimal
Intense Gasping	0.5 Hz	30 st	6.0	3.5	Breathy, excited, passionate

Preset Applications

🎵 Vocal Enhancement Presets

Gentle Breath: Subtle naturalness for spoken word, podcast vocals

Emotional Swell: Singing enhancement, ballad vocals, emotional speech

Subtle Tremor: Intimate vocals, ASMR, whispered content

🎭 Dramatic Effect Presets

Dramatic Breath: Theater, voice acting, dramatic reading

Panic Breathing: Horror, thriller, intense scenes, emergency calls

Intense Gasping: Passionate singing, intense dialogue, climactic moments

🧘 Therapeutic Presets

Deep Meditation: Guided meditation, relaxation, mindfulness content

Gentle Breath: Breathing exercises, calm narration, sleep content

Parameters

Breathing Parameters

Parameter	Type	Default	Range	Description
preset	option	Manual	8 options	Emotional preset or manual configuration
breath_rate	positive	0.3	0.05-2.0	Breathing cycles per second (Hz)
pitch_depth_semitones	positive	18	1-48	Pitch variation range in semitones
micro_flutter	positive	4	0-10	High-frequency pitch variations
emotional_intensity	positive	2.5	0.1-5.0	Emotional build-up over time

Pitch Analysis Parameters

Parameter	Type	Default	Range	Description
time_step	positive	0.005	0.001-0.05	Pitch analysis time step (seconds)
minimum_pitch	positive	50	30-200	Minimum expected pitch (Hz)
maximum_pitch	positive	900	200-2000	Maximum expected pitch (Hz)

Output Parameters

Parameter	Type	Default	Range	Description
output_sample_rate	positive	44100	8000-192000	Output sample rate (Hz)
resample_precision	positive	50	10-100	Resampling quality
play_after_processing	boolean	yes	yes/no	Auto-play result
keep_intermediate_objects	boolean	no	yes/no	Keep temporary objects

Applications

Vocal Production

Use case: Adding emotional expression to singing vocals

Technique: Use Emotional Swell or Gentle Breath presets

Example: Apply to ballad vocals for natural breathing expression

Voice Acting & Character Voices

Use case: Creating specific emotional states for character voices

Technique: Match preset to character emotion

Example: Panic Breathing for frightened character, Dramatic Breath for heroic speech

Therapeutic Audio

Use case: Guided meditation and breathing exercises

Technique: Use Deep Meditation or Gentle Breath presets

Workflow:

Apply to meditation guide voice
Match breathing rate to exercise instructions
Create calming, centered vocal quality

Experimental Music

Use case: Creating unusual vocal textures and expressions

Technique: Extreme parameter settings or multiple processing passes

Examples:

Very high breath_rate with moderate depth for frantic effect
Very high emotional_intensity for dramatic arcs
Extreme micro_flutter for distorted, chaotic vocals

Film & Game Audio

Use case: Emotional voice processing for media

Advantages:

Consistent emotional quality across takes
Precise control over emotional intensity
Reproducible results for multiple characters
Time-efficient compared to manual automation

Example: Processing dialogue for emotional scenes in films or games

ASMR & Relaxation Content

Use case: Creating calming, intimate vocal qualities

Technique: Subtle Tremor or Gentle Breath with low parameters

Application: Whispered content, relaxation narration, sleep stories

Practical Workflow Examples

🎤 Singing Enhancement (Music Production)

Goal: Add natural breathing expression to vocal performance

Settings:

Preset: Emotional Swell
breath_rate: 0.25 Hz
pitch_depth_semitones: 12-18
emotional_intensity: 1.5-2.5

Result: Expressive vocals with natural breathing character

🎭 Character Voice (Voice Acting)

Goal: Create specific emotional state for character

Settings:

Preset: Panic Breathing
breath_rate: 0.6-0.8 Hz
pitch_depth_semitones: 24-36
micro_flutter: 6-8

Result: Anxious, frantic character voice

🧘 Guided Meditation (Therapeutic)

Goal: Create calming, centered vocal quality

Settings:

Preset: Deep Meditation
breath_rate: 0.1 Hz
pitch_depth_semitones: 4-6
emotional_intensity: 0.5

Result: Calming voice for meditation guidance

Advanced Techniques

Layered processing for complex characters:

Multiple passes: Apply different presets to same audio
Section-based: Different parameters for different song sections
Automation: Vary parameters over time for dynamic effects
Combination: Use with other effects (reverb, delay, compression)

Experiment with parameter combinations for unique vocal characters

Parameter interaction strategies:

High rate + low depth: Nervous, anxious quality
Low rate + high depth: Dramatic, theatrical expression
High flutter + low intensity: Natural, imperfect voice
Low everything: Subtle, almost imperceptible enhancement

Troubleshooting Common Issues

Problem: No pitch detected/effect doesn't work
Cause: Pitch outside min/max range or non-pitched content
Solution: Adjust minimum_pitch and maximum_pitch brackets, use pitched vocal content

Problem: Artificial or robotic sounding result
Cause: Parameters too extreme or mismatched with content
Solution: Use more subtle settings, try different presets, ensure natural vocal source

Problem: Extreme pitch variations causing distortion
Cause: Very high pitch_depth_semitones or emotional_intensity
Solution: Reduce pitch depth, use moderate intensity values

Problem: Processing very slow
Cause: Very high density of pitch points or long audio
Solution: This is normal for long files, consider processing shorter segments

Technical Deep Dive

Pitch Manipulation Mathematics

Semitone to Frequency Conversion

Frequency ratio calculation:

Semitone to frequency ratio: ratio = 2 ^ (semitones / 12) Where: semitones = pitch shift in semitones ratio = frequency multiplication factor Example: +12 semitones (octave) ratio = 2^(12/12) = 2^1 = 2 (double the frequency) Example: +7 semitones (perfect fifth) ratio = 2^(7/12) ≈ 1.498 (approximately 3:2 ratio) Application in script: total_shift = [calculated semitone shift] ratio = 2 ^ (total_shift / 12) new_f0 = median_f0 * ratio

Phase-Based Component Frequencies

Multi-frequency modulation system:

Fundamental breathing: frequency = breath_rate (0.1-0.8 Hz) Harmonic component: frequency = breath_rate * 2 (0.2-1.6 Hz) Subharmonic component: frequency = breath_rate * 0.5 (0.05-0.4 Hz) Micro-flutter components: flutter_freq = breath_rate * 12 (1.2-9.6 Hz) chaos_freq = breath_rate * 23.7 (2.37-18.96 Hz) Tremor component: tremor_freq = breath_rate * 7.3 (0.73-5.84 Hz) Gasp component: gasp_freq = breath_rate * 3 (0.3-2.4 Hz) All components phase-locked to fundamental breathing rate

Dense Point Sampling

Smooth Curve Generation

Adaptive point density algorithm:

Point calculation: npoints = round(duration / 0.01) Constrained: 200 ≤ npoints ≤ 2000 Where: duration = audio length in seconds 0.01 = target point spacing (10 ms) 200 = minimum points (for very short audio) 2000 = maximum points (for very long audio) Example: 5 second audio npoints = round(5 / 0.01) = 500 points Example: 30 second audio npoints = round(30 / 0.01) = 3000 → clamped to 2000 Advantage: - Short audio: sufficient points for smoothness - Long audio: reasonable processing time - Consistent perceptual quality