Pitch Contour Transfer — User Guide

Mean-preserving pitch transfer: Apply the average pitch of a source sound to a target sound while preserving the target's original pitch contour shape and timing.

Author: Shai Cohen Affiliation: Department of Music, Bar-Ilan University, Israel Version: 1.0 (2025) License: MIT License Repo: https://github.com/ShaiCohen-ops/Praat-plugin_AudioTools

Contents:

What this does Quick start Pitch Transfer Theory Parameters & Settings Complete Workflow Applications Practical Examples

What this does

This script implements mean-preserving pitch contour transfer — a technique that transfers the average pitch (mean F0) from a source sound to a target sound while maintaining the target's original pitch contour shape and timing. The target sound's pitch is shifted by the difference between source and target means, preserving all relative pitch movements, micro-intonation, and timing patterns.

Key Features:

Mean-Based Transfer — Calculates and applies difference between source and target average pitch
Contour Preservation — Maintains target's pitch contour shape and relative movements
Blend Control — Adjustable strength parameter for partial transfer
Pitch Floor/Ceiling — Separate frequency bounds for source and target
Voiced Frame Analysis — Detailed pitch tracking statistics
Overlap-Add Resynthesis — High-quality pitch shifting without artifacts

What is pitch contour transfer? Traditional pitch shifting: Multiply all frequencies by constant ratio (e.g., +2 semitones = ×1.122). Pitch contour transfer: (1) Analysis: Calculate mean F0 of source and target. (2) Shift calculation: Difference = source_mean - target_mean. (3) Application: Add difference to each pitch point in target's contour. (4) Result: Target sound with source's average pitch but its own contour shape. Applications: Voice transformation (transfer speaking pitch between speakers), singing voice modification (adjust key while preserving phrasing), instrumental sound design (apply vocal pitch characteristics to instruments), prosody research (study pitch contour independent of absolute frequency).

Technical Implementation: (1) Pitch analysis: Extract pitch contours using Praat's Pitch object with configurable time step and frequency bounds. (2) Mean calculation: Compute arithmetic mean of voiced frames for both sounds. (3) Shift computation: Mean_shift = (mean_A - mean_B) × blend_strength. (4) Contour transfer: Create new pitch tier with shifted points: F0_new = F0_original + mean_shift. (5) Boundary enforcement: Clip shifted values to pitch floor/ceiling range. (6) Resynthesis: Use Manipulation object with overlap-add synthesis for artifact-free pitch modification. (7) Validation: Re-analyze result to verify pitch shift accuracy. Key insight: Additive shift preserves contour shape better than multiplicative scaling for small to moderate shifts.

Quick start

In Praat Objects window, select two Sound objects:
- First selected: Sound A (source style)
- Second selected: Sound B (target sound)
Open script: pitch_contour_transfer.praat
Adjust analysis parameters (or use defaults):
- Analysis_time_step: 0.01 seconds (10ms)
- Pitch_floor_A: 75 Hz (source lower bound)
- Pitch_ceiling_A: 300 Hz (source upper bound)
- Pitch_floor_B: 50 Hz (target lower bound)
- Pitch_ceiling_B: 300 Hz (target upper bound)
Set Blend_strength (1.0 = full transfer, 0.5 = half transfer, 0.0 = no change)
Click Run — script analyzes both sounds, calculates shift, applies to target
Result appears as "targetname_shifted" in Objects window
Script displays shift details in Info window and auto-plays result

Quick tip: For voice processing: Use pitch_floor = 75 Hz (male) or 100 Hz (female), pitch_ceiling = 300 Hz. For instruments: Adjust bounds to match instrument range (e.g., violin: 200-1000 Hz). Set blend_strength = 0.5 for subtle effects, 1.0 for complete transfer. Check Info window for mean frequencies and shift amount. Result preserves target's timing and contour shape — if target has expressive vibrato, shifted result will have same vibrato pattern at new pitch level. For large shifts (>4 semitones), consider using smaller blend_strength or multiple passes.

Important: EXACTLY 2 SOUNDS REQUIRED — selection order matters (A then B). Pitch analysis quality depends on parameters: Incorrect floor/ceiling can miss pitch or capture harmonics. Only voiced frames transferred: Unvoiced/silent regions unchanged. Blend_strength multiplicative: 2.0 doubles calculated shift. Clipping at boundaries: Extreme shifts may hit floor/ceiling limits, flattening contour extremes. Timbral effects: Large pitch shifts can change perceived voice quality (formant shifting not applied). Resynthesis artifacts: Very large shifts may cause robotic artifacts with overlap-add synthesis. Duration unchanged: Timing and duration preserved exactly.

Pitch Transfer Theory

Mathematical Foundation

📐 The Transfer Equation

Given:

mean_A = mean F0 of source sound A (Hz)
mean_B = mean F0 of target sound B (Hz)
blend = blend_strength parameter (0.0 to ∞)

Shift calculation:

mean_shift = (mean_A - mean_B) × blend

For each pitch point in target contour:

F0_original(t) = pitch value at time t (Hz) F0_shifted(t) = F0_original(t) + mean_shift

Boundary enforcement:

IF F0_shifted(t) < floor_B → F0_final(t) = floor_B IF F0_shifted(t) > ceiling_B → F0_final(t) = ceiling_B ELSE F0_final(t) = F0_shifted(t)

Additive vs Multiplicative Shifting

🎯 Preserving Contour Shape

Multiplicative shifting (traditional):

F0_final(t) = F0_original(t) × ratio Example: +2 semitones = ×1.122

Effect on contour: Expands/compresses pitch range proportionally.

Problem: Large shifts distort contour shape (wide vibrato becomes wider).

Additive shifting (this script):

F0_final(t) = F0_original(t) + constant Example: +50 Hz added to all points

Effect on contour: Preserves exact shape, just transposes.

Advantage: Maintains relative pitch movements and micro-intonation.

When to use each:

Additive: Small-medium shifts, preserving expressive contours
Multiplicative: Large shifts, key changes in music
This script: Uses additive shifting for contour preservation

Pitch Analysis Methodology

PRAAT PITCH EXTRACTION ALGORITHM: STEP 1: Frame-based analysis Window length = 3 × (1 / pitch_floor) Time step = analysis_time_step (default 0.01s = 10ms) For each frame: • Compute autocorrelation • Find peaks in autocorrelation function • Select peak corresponding to fundamental frequency • Apply voicing threshold (0.45 default) • Output F0 or "undefined" for unvoiced STEP 2: Mean calculation mean = Σ(F0_voiced) / N_voiced Only voiced frames included Unvoiced/silent frames ignored STEP 3: Contour extraction For each time t where F0 ≠ undefined: Store (t, F0) as pitch point Linear interpolation between points

Semitone Conversion

HERTZ TO SEMITONES CONVERSION: Given reference frequency f_ref (usually 440 Hz for A4) Semitones from reference = 12 × log₂(f / f_ref) Relative semitone shift between two frequencies: Δ_st = 12 × log₂(f2 / f1) Script calculation: semitones = 12 × log₂((mean_B + mean_shift) / mean_B) Example: mean_B = 200 Hz mean_shift = +50 Hz semitones = 12 × log₂(250 / 200) = 12 × log₂(1.25) = 12 × 0.3219 = 3.86 semitones

Blend Strength Interpretation

BLEND STRENGTH EFFECTS: blend_strength = 0.0 → No change blend_strength = 0.5 → Half the calculated shift applied blend_strength = 1.0 → Full shift applied blend_strength = 2.0 → Double shift applied Mathematical interpretation: actual_shift = (mean_A - mean_B) × blend_strength Example: mean_A = 300 Hz, mean_B = 200 Hz Natural difference = 100 Hz blend_strength = 0.5 → shift = 50 Hz blend_strength = 1.0 → shift = 100 Hz blend_strength = 1.5 → shift = 150 Hz Use cases: • 0.3-0.7: Subtle voice matching • 1.0: Complete mean transfer • 1.5-2.0: Exaggerated effects • 0.0: Diagnostic (hear original)

Parameters & Settings

Core Parameters

Parameter	Default	Range	Description
Analysis_time_step	0.01	0.001-0.05	Time between pitch analysis points (seconds). Smaller = more detail, larger = faster
Pitch_floor_A	75	50-500	Minimum expected F0 for source sound (Hz). Set to speaker/instrument lowest note
Pitch_ceiling_A	300	100-1000	Maximum expected F0 for source sound (Hz). Set to speaker/instrument highest note
Pitch_floor_B	50	30-400	Minimum expected F0 for target sound (Hz). Lower than floor_A for flexibility
Pitch_ceiling_B	300	100-1000	Maximum expected F0 for target sound (Hz). Can match or exceed ceiling_A
Blend_strength	1.0	0.0-5.0	Strength of pitch transfer. 1.0 = full difference, 0.5 = half, 2.0 = double

Parameter Guidelines by Sound Type

Human Voice Settings:

Voice Type	Pitch Floor	Pitch Ceiling	Notes
Bass male	80 Hz	200 Hz	Capture fundamental, exclude first harmonic
Tenor male	100 Hz	300 Hz	Standard male speech range
Alto female	150 Hz	350 Hz	Female speech, lower singing
Soprano female	200 Hz	500 Hz	Higher female voice, child voice
Child	250 Hz	600 Hz	High-pitched voices

Musical Instrument Settings:

Instrument	Pitch Floor	Pitch Ceiling	Notes
Bass guitar	40 Hz	250 Hz	Very low fundamentals
Cello	65 Hz	500 Hz	Wide range, expressive
Violin	200 Hz	1000 Hz	High fundamentals, harmonics
Flute	250 Hz	1200 Hz	Pure tones, clear pitch
Trumpet	150 Hz	800 Hz	Bright, strong fundamentals

Advanced Parameter Interactions

Floor/Ceiling Pitfalls:

Floor too high: Misses low pitches, mean calculation inaccurate
Ceiling too low: Caps high pitches, distorts mean
Floor too low: May capture subharmonics or noise
Ceiling too high: May capture harmonics instead of fundamental

Rule of thumb: Set floor to 0.75× lowest expected pitch, ceiling to 1.5× highest expected pitch.

Time Step Considerations:

0.01s (default): Good for most speech and music (100 points/second)
0.005s: Higher detail for fast pitch movements (vibrato, glissandi)
0.02s: Smoother contours, faster processing (50 points/second)
0.001s: Extreme detail for research (1000 points/second, slow)

Performance: Smaller time step = more pitch points = longer processing but more accurate contour.

Complete Workflow

Step-by-Step Algorithm

🔧 Script Execution Flow

Phase 1: Input Validation

1. Check exactly 2 Sound objects selected 2. Verify selection order: A (source) then B (target) 3. Extract names for reporting 4. Display header in Info window

Phase 2: Source Analysis (Sound A)

1. Create Pitch object from Sound A - Time step = analysis_time_step - Floor = pitch_floor_A - Ceiling = pitch_ceiling_A 2. Calculate mean F0 of voiced frames 3. Display: "Sound A: X.X Hz"

Phase 3: Target Analysis (Sound B)

1. Create Pitch object from Sound B - Time step = analysis_time_step - Floor = pitch_floor_B - Ceiling = pitch_ceiling_B 2. Get frame count and voiced frame count 3. Calculate mean F0 and voiced percentage 4. Display: "Sound B: X.X Hz (Y.Y% voiced)"

Phase 4: Shift Calculation

1. mean_shift = (mean_A - mean_B) × blend_strength 2. Convert to semitones for user reference 3. Display: "Shift: X.X Hz (Y.YY semitones)"

Phase 5: Contour Transfer

1. Convert Sound B to Manipulation object 2. Extract empty PitchTier 3. For each voiced frame in Pitch B: - Get time t and F0 value - Calculate shifted_F0 = F0 + mean_shift - Enforce floor/ceiling boundaries - Add point to PitchTier 4. Count points added 5. Display: "Points added: X / Y"

Phase 6: Resynthesis

1. Replace PitchTier in Manipulation object 2. Resynthesize using overlap-add method 3. Rename result: "originalname_shifted"

Phase 7: Verification & Output

1. Re-analyze result pitch 2. Calculate new mean F0 3. Display: "Result: OLD → NEW Hz" 4. Auto-play result sound 5. Clean up temporary objects

Information Window Output

TYPICAL INFO WINDOW OUTPUT: PITCH MEAN SHIFT TRANSFER Source: speaker_female Target: speaker_male Sound A: 220.5 Hz Sound B: 125.3 Hz (92.4% voiced) Shift: 95.2 Hz (5.67 semitones) Points added: 842 / 911 Result: 125.3 → 220.5 Hz Playing... Done: 'speaker_male_shifted'

Object Creation Chain

🔄 Praat Object Pipeline

Input Objects:

Sound A (source, first selected)
Sound B (target, second selected)

Analysis Objects (temporary):

Pitch A (from Sound A)
Pitch B (from Sound B)
Manipulation (from Sound B)
PitchTier (empty, then filled)

Synthesis Objects:

Sound_result (resynthesized from Manipulation)
Pitch_result (from Sound_result, for verification)

Final Output:

"originalname_shifted" Sound object

Cleanup: All temporary objects removed automatically.

Troubleshooting Common Issues

Problem: "Please select exactly 2 Sound objects"
Cause: Wrong number of sounds selected
Solution: Select exactly 2 Sound objects in Praat Objects window (Ctrl+click)

Problem: Shift amount seems wrong (too small/large)
Cause: Incorrect pitch floor/ceiling capturing harmonics or missing F0
Solution: Adjust floor/ceiling parameters, check with View & Edit window

Problem: Result has robotic/artifact sound
Cause: Large shift causing phase issues in overlap-add synthesis
Solution: Reduce blend_strength, try multiple smaller shifts

Problem: Only part of sound shifted (low percentage voiced)
Cause: Target has unvoiced regions (consonants, silence)
Solution: This is normal — only voiced frames shifted, unvoiced unchanged

Problem: Shift hits ceiling/floor limits
Cause: Extreme shift pushing beyond pitch bounds
Solution: Increase ceiling_B or decrease floor_B, or reduce blend_strength

Applications

Voice Transformation

Use case: Adjust speaker's average pitch while preserving speaking style

Technique: Use natural speech as source, target voice to be modified

Example: Make male voice speak at female average pitch while keeping male timbre and prosody

Singing Voice Modification

Use case: Adjust singing key while preserving vocal expression

Technique: Transfer pitch from reference performance to target recording

Workflow:

Source: Well-tuned reference vocal
Target: Expressive but pitch-inaccurate recording
Result: Expressive performance at correct average pitch
blend_strength: 0.7-0.9 (preserve some original pitch character)

Prosody Research

Use case: Study pitch contour patterns independent of absolute frequency

Technique: Normalize multiple speakers to common mean pitch

Research applications:

Compare intonation patterns across speakers
Isolate contour from pitch height effects
Create pitch-normalized stimuli for perception tests
Study gender differences in prosody separate from F0

Instrumental Sound Design

Use case: Apply vocal pitch characteristics to instruments

Technique: Use expressive vocal as source, sustained instrument as target

Example:

Source: Expressive speech with natural pitch variation
Target: Sustained violin note
Result: Violin with speech-like pitch contour
Creative applications: "Talking instruments", hybrid textures

Audio Restoration

Use case: Correct pitch drift in historical recordings

Technique: Use stable reference pitch, apply to drifting recording

Workflow:

Source: Modern stable recording at correct pitch
Target: Historical recording with wow/flutter
Result: Historically informed but pitch-stable version
Note: Works best for consistent drift, not random fluctuations

Language Teaching

Use case: Demonstrate intonation patterns at comfortable pitch

Technique: Transfer native speaker contour to learner's comfortable range

Example:

Source: Native speaker with correct intonation
Target: Learner's voice (or synthetic voice at learner's range)
Result: Correct intonation pattern at comfortable pitch for imitation

Practical Examples

Example 1: Gender Voice Matching

👥 Male-to-Female Pitch Adjustment

Goal: Make male voice speak at typical female pitch range

Source (A): Female speech sample (mean ~220 Hz)

Target (B): Male speech sample (mean ~125 Hz)

Settings:

Pitch_floor_A: 150 Hz (female lower bound)
Pitch_ceiling_A: 350 Hz (female upper bound)
Pitch_floor_B: 80 Hz (male lower bound)
Pitch_ceiling_B: 300 Hz (male upper bound, extended)
Blend_strength: 1.0 (full transfer)

Expected shift: ~95 Hz (+5-6 semitones)

Result: Male voice with female average pitch, male timbre and prosody preserved

Example 2: Singing Intonation Correction

🎵 Vocal Pitch Stabilization

Goal: Improve singing intonation while preserving expression

Source (A): Well-tuned reference vocal (mean 262 Hz ~ C4)

Target (B): Expressive but flat recording (mean 248 Hz ~ B3)

Settings:

Pitch_floor_A: 200 Hz (below expected range)
Pitch_ceiling_A: 400 Hz (above expected range)
Pitch_floor_B: 180 Hz (allow downward shift)
Pitch_ceiling_B: 420 Hz (allow upward shift)
Blend_strength: 0.8 (partial correction)

Expected shift: +11 Hz (~0.75 semitones up)

Result: More in-tune vocal with 80% of expression preserved

Example 3: Instrument-Voice Hybrid

🎻 Expressive Instrument Creation

Goal: Make instrument follow speech pitch contour

Source (A): Emotional speech (high pitch variation)

Target (B): Sustained cello note (constant 196 Hz ~ G3)

Settings:

Pitch_floor_A: 100 Hz (speech lower bound)
Pitch_ceiling_A: 350 Hz (speech excited peaks)
Pitch_floor_B: 65 Hz (cello lowest)
Pitch_ceiling_B: 500 Hz (cello range)
Blend_strength: 1.0
Analysis_time_step: 0.005 (capture fast speech changes)

Result: Cello with speech-like pitch inflections, creating "talking instrument" effect

Example 4: Multi-Step Processing

Advanced: Progressive Pitch Adjustment

Situation: Need large pitch shift but want to avoid artifacts

Solution: Multiple passes with moderate blend_strength

Step 1: Initial shift blend_strength = 0.5 Result: Halfway to target pitch Step 2: Use result as new target Source: Original source (unchanged) Target: Result from step 1 blend_strength = 0.66 (2/3 of remaining) Step 3: Final adjustment Source: Original source Target: Result from step 2 blend_strength = 1.0 (complete remaining) Total effect: 0.5 + (0.5×0.66) + (0.17×1.0) = 1.0 (full shift) Advantage: Smoother, fewer artifacts than single large shift

Example 5: Contour Exaggeration

Creative: Amplifying Expressivity

Goal: Make subtle pitch variations more dramatic

Method 1: Exaggerate existing contour Source: Target sound itself (mean_A = mean_B) blend_strength = 0.0 (no mean change) BUT: Manually edit PitchTier points to amplify variations Method 2: Use extreme blend_strength Source: Sound with exaggerated contour Target: Natural speech blend_strength = 1.5-2.0 (over-transfer) Method 3: Combined approach Step 1: Extract and scale own PitchTier Step 2: Apply with blend_strength = 0.0 + manual scaling