Peephole Montage — User Guide

Interactive audio collage creation: manually mark moments of interest in an audio file, then automatically extract and combine them into a jump-cut montage with artistic variations and cinematic effects.

Author: Shai Cohen Version: 2.0 (Interactive) License: MIT License Category: Audio Editing & Creative Montage
Contents:

What this does

This script implements interactive audio montage creation — a unique workflow that combines manual selection with automated processing to create jump-cut collages from any audio file. The process: (1) Interactive Marking: Opens a PointProcess editor for you to manually mark moments of interest (peepholes) in the audio. (2) Window Extraction: Automatically extracts symmetric or asymmetric windows around each marked point. (3) Artistic Processing: Applies one of four creative styles to transform the raw extracts. (4) Seamless Assembly: Concatenates processed windows with crossfading to create a continuous montage. (5) Statistical Analysis: Reports compression ratio and timing information about the created montage.

Key Features:

What is a peephole montage? Traditional audio editing: Manual cutting and pasting of segments. Peephole montage: Selective extraction and creative reassembly of moments. Advantages: (1) Intuitive workflow: Mark what sounds interesting, let script handle extraction. (2) Creative discovery: Find unexpected connections between disparate moments. (3) Time compression: Create highlight reels from long recordings. (4) Artistic transformation: Apply creative processing to extracted moments. (5) Repeatable: Same points create same montage, but with creative variations possible. Use cases: Music production (creating loops from recordings), film editing (sound collage creation), oral history (highlight compilation), sound design (moment extraction), education (teaching audio editing concepts).

Technical Implementation: The script creates a hybrid workflow: (1) PointProcess Creation: Generates empty point process synchronized with source audio timeline. (2) Editor Interface: Opens Praat's built-in editor for manual point placement (Ctrl-P/Cmd-P). (3) Window Extraction: For each marked point, extracts audio segment using either symmetric (centered) or asymmetric (custom pre/post) windows. (4) Crossfade Application: Applies selected fade type (none, linear, cosine, Hamming) to prevent clicks. (5) Style Processing: Applies chosen artistic transformation to each segment. (6) Concatenation: Joins all processed segments into final montage. Key insight: The human selects WHAT moments are interesting; the script handles HOW to extract and combine them creatively.

Quick start

  1. In Praat, select exactly one Sound object (any audio file).
  2. Run script…peephole_montage.praat.
  3. A form appears with parameters — accept defaults or adjust as needed.
  4. Click OK — a PointProcess editor opens with your audio.
  5. Navigate through audio using standard Praat editor controls.
  6. At interesting moments, press Ctrl-P (Windows/Linux) or Cmd-P (Mac) to add a point.
  7. Add as many points as desired (minimum 1).
  8. Click Continue button in editor — script extracts and processes marked moments.
  9. Watch console for processing progress and statistics.
  10. Final montage named "peephole_montage" appears in Objects window and plays automatically.
Quick tip: Start with "Pure peephole" style and symmetric windows (0.5s) for straightforward results. Use Ctrl-P/Cmd-P precisely at interesting moments — these become the center points of extracted windows. The editor shows blue vertical lines at marked points — you can add, move, or delete points. For music, mark beat drops, interesting riffs, or vocal phrases. For speech, mark important words, emotional moments, or pauses. Use asymmetric windows (pre=0.3s, post=0.2s) for speech to capture context before important words. Cosine fade (0.01s) works best for most material. The output montage plays automatically — listen for how the marked moments connect.
Important: ONE SOUND ONLY — Script fails with 0 or 2+ sounds selected. Point marking is manual — You must actively listen and mark points; script doesn't auto-detect "interesting" moments. Editor expertise needed — Basic Praat editor navigation skills required. Points are time positions — Not regions/segments; windows are extracted around points. Processing time depends on number of points and style complexity — 20+ points with Unreliable narrator may take minutes. Original audio unchanged — Only new montage is created. Memory usage increases with many points/long windows — may be slow with 50+ points. Editor must be closed by clicking Continue — don't close window manually.

Interactive Workflow

🔄 Three-Phase Montage Creation

Phase 1: Setup & Parameter Selection — Choose extraction and artistic parameters

Phase 2: Interactive Point Marking — Listen and mark moments in editor

Phase 3: Automated Processing & Assembly — Script extracts, processes, combines

Phase 1: Parameter Selection

Parameter GroupKey ParametersDefault ValuesPurpose
Window extractionWindow_length, Asymmetric_windows, Pre_length, Post_length0.5s, off, 0.3s, 0.2sControl what gets extracted around each point
Anti-click processingFade_type, Fade_durationCosine, 0.01sPrevent clicks at segment boundaries
Artistic variationsMontage_stylePure peepholeCreative transformation of extracted segments
Style-specific optionsVarious (depends on style)Style-dependentFine-tune artistic effects
OutputOutput_namepeephole_montageName of final montage object

Phase 2: Interactive Point Marking

EDITOR WORKFLOW: 1. Editor opens showing sound waveform 2. Navigation controls (standard Praat): • Play/Stop: Spacebar or buttons • Zoom: Click-drag in timeline • Scroll: Horizontal scrollbar • Selection: Click-drag in waveform 3. Point marking: • Position cursor at interesting moment • Press Ctrl-P (Windows/Linux) or Cmd-P (Mac) • Blue vertical line appears at cursor position • Repeat for additional points 4. Point management: • Move point: Drag blue line • Delete point: Select point (click near it), press Delete • Multiple points: Can be anywhere in timeline • Minimum: 1 point required 5. Completion: • Click "Continue" button in editor • Script proceeds to extraction phase • Editor closes automatically TIPS FOR EFFECTIVE MARKING: • Mark musical beats for rhythmic montages • Mark sentence beginnings for speech highlights • Mark emotional peaks for dramatic montages • Vary density: cluster points for intensity, space for pacing

Phase 3: Automated Processing

PROCESSING STEPS: For each marked point i (1 to n_points): 1. Calculate window boundaries: IF Asymmetric_windows = 0: start = t_i - Window_length/2 end = t_i + Window_length/2 ELSE: start = t_i - Pre_length end = t_i + Post_length 2. Extract segment: segment = Extract part: start, end, "rectangular", 1, "no" 3. Apply fade (if selected): Cosine: self × (1 - cos(π × x/fade_duration))/2 on edges Linear: linear ramp up/down Hamming: self × (0.54 - 0.46 × cos(π × x/fade_duration)) on edges 4. Apply artistic style (if not Pure peephole): • Context ramp: Adaptive window sizing • Unreliable narrator: Mutations • Microscope: Time stretching 5. Store processed segment CONCATENATION: Select all processed segments Concatenate (Praat's Concatenate command) Rename to Output_name STATISTICS CALCULATION: Compression ratio = Original duration / Montage duration Reported in console

Window Extraction Strategies

StrategyWindow TypeTypical SettingsBest ForExample Use
SymmetricCentered on pointLength=0.5sMusical moments, isolated soundsDrum hits, vocal syllables
Asymmetric (pre-heavy)More before pointPre=0.4s, Post=0.1sSpeech, anticipationWord beginnings, phrase starts
Asymmetric (post-heavy)More after pointPre=0.1s, Post=0.4sResonance, decayGuitar notes, reverb tails
Custom asymmetricContext-dependentPre=0.3s, Post=0.2sNatural speech flowConversation highlights

Fade Types Comparison

FADE FORMULAS (applied to first/last fade_duration seconds): 1. NONE (no fade): No processing → potential clicks at boundaries 2. LINEAR: First fade_duration: self × (x / fade_duration) Last fade_duration: self × (1 - (x - (dur-fade_duration))/fade_duration) Simple but may cause slight distortion 3. COSINE (recommended): First: self × (1 - cos(π × x/fade_duration))/2 Last: self × (1 - cos(π × (dur - x)/fade_duration))/2 Smooth, natural sounding 4. HAMMING: First: self × (0.54 - 0.46 × cos(π × x/fade_duration)) Last: self × (0.54 - 0.46 × cos(π × (dur - x)/fade_duration)) Very smooth, slight spectral modification FADE DURATION GUIDELINES: • 0.005s: Very quick, good for sharp cuts • 0.01s: Default, works for most material • 0.02s: Smooth, good for musical material • 0.05s: Very smooth, audible as fade

Montage Styles

🎨 4 Creative Processing Styles

Style 1: Pure peephole — Baseline, unprocessed jump-cuts

Style 2: Context ramp — Cinematic, adaptive window sizing

Style 3: Unreliable narrator — Subtle mutations and distortions

Style 4: Microscope — Time-stretched examination of moments

Style 1: Pure Peephole (Baseline)

🔍 Straightforward Jump-Cut Montage

Character: Clean, direct, unprocessed extracts

Processing: Only window extraction + crossfade

Use when: You want pure selection without artistic transformation

Best for: Documentary work, speech highlights, musical phrase extraction

Example: Creating a "greatest hits" from a lecture or concert

Style 2: Context Ramp (Cinematic)

🎬 Adaptive Window Sizing

Character: Cinematic, context-aware, dynamic

Processing: Analyzes local intensity and pause patterns to adjust window sizes

Adaptive logic:

  • Longer pre-window if following a pause (builds anticipation)
  • Shorter post-window if high intensity (cuts quickly after climax)
  • Standard windows for normal contexts

Best for: Film/TV sound design, dramatic storytelling, emotional highlights

Example: Creating trailer-like montage from a film dialogue

Style 3: Unreliable Narrator

🎭 Subtle Mutations and Distortions

Character: Memory-like, distorted, dreamy

Processing: Applies progressive mutations to create "faulty memory" effect

Mutations include:

  • Random stereo flip: Occasionally swaps left/right channels (stereo only)
  • Progressive pitch bias: Later segments more pitch-shifted (simulates memory decay)
  • Rule-based variations: Different processing for different segment indices

Parameters:

  • UN_random_stereo_flip: Enable/disable channel swapping
  • UN_pitch_bias_range: Maximum semitone shift for later segments

Best for: Experimental music, dream sequences, memory-themed pieces

Example: Creating a "fading memory" montage from childhood recordings

Style 4: Microscope

🔬 Time-Stretched Examination

Character: Detailed, expanded, analytical

Processing: Time-stretches each segment to examine moment in detail

Time stretching methods:

  • With pitch preservation: Uses PSOLA manipulation (slower, higher quality)
  • Without pitch preservation: Simple PSOLA time scaling (faster, pitch changes)

Parameters:

  • Microscope_time_factor: Stretch factor (2.0 = twice as long)
  • Microscope_preserve_pitch: Keep original pitch during stretching

Best for: Sound analysis, granular synthesis, experimental time manipulation

Example: Stretching vocal moments to hear subtle formant transitions

Style Selection Guide

Project GoalRecommended StyleKey SettingsExpected Result
Clean highlights compilationPure peepholeWindow=0.5s, Cosine fadeDirect, unprocessed jump-cuts
Dramatic film trailerContext rampAsymmetric windows, AdaptiveCinematic, context-aware montage
Dream sequence/sound artUnreliable narratorPitch bias=1.0, Stereo flip onDistorted, memory-like collage
Analytical/slow listeningMicroscopeTime factor=3.0, Preserve pitchExpanded, detailed examination
Musical phrase samplingPure peephole or Context rampSymmetric windows, Short fadesRhythmic, musical montage

Advanced Style Combinations

Sequential Processing:
  1. Create montage with Style A
  2. Use montage as input for second pass with Style B
  3. Creates layered artistic effects
  4. Example: Pure peephole → Unreliable narrator (extract then distort)
Multi-Style Montage:
  • Process different point subsets with different styles
  • Combine results manually in Praat or DAW
  • Creates piece with varying texture and processing
  • Example: Speech points → Pure, musical points → Microscope

Technical Details

Window Extraction Algorithm

FOR EACH MARKED POINT t_i: If Asymmetric_windows = 0: half = Window_length / 2 start = t_i - half end = t_i + half If Asymmetric_windows = 1: start = t_i - Pre_length end = t_i + Post_length BOUNDARY CLAMPING: if start < sound_start: start = sound_start if end > sound_end: end = sound_end EXTRACTION: segment = Extract part: start, end, "rectangular", 1, "no" SPECIAL CASES: • If window extends beyond sound boundaries: clipped to available audio • Very short windows (< 0.01s): May cause processing issues • Overlapping windows (if points close): Segments will overlap in montage

Context Ramp Analysis

CONTEXT ANALYSIS FOR EACH POINT: 1. Extract local window (2×Window_length before, 0.5× after) 2. Calculate intensity: intensity = To Intensity: 100, 0, "yes" mean_dB = Get mean: 0, 0, "dB" normalized = (mean_dB - 40) / 40 # Assuming 40-80 dB typical range clamped to 0-1 3. Detect pause before: pre_window = Extract part: t - 2×Window_length to t - 0.25×Window_length pre_intensity = To Intensity: 100, 0, "yes" pre_mean = Get mean: 0, 0, "dB" pause_detected = (pre_mean < mean_dB - 10 dB) ? 1 : 0 ADAPTIVE WINDOW SIZING: adaptive_pre = Pre_length × (1 + pause_detected × 0.5) adaptive_post = Post_length × (1.2 - normalized × 0.4) PHYSICAL INTERPRETATION: • After pause: Take more context (build anticipation) • High intensity: Cut quickly after climax • Normal context: Standard window sizes

Unreliable Narrator Mutations

MUTATION ALGORITHM: 1. STEREO FLIP (if enabled and segment is stereo): if randomInteger(1, 2) = 1: Extract channels: ch1, ch2 Combine to stereo: ch2 + ch1 (swapped) 2. PITCH BIAS (progressive): bias_factor = (segment_index / total_segments) × UN_pitch_bias_range bias_semitones = randomUniform(-bias_factor, bias_factor) For MONO segments: To Manipulation → Extract pitch tier Formula: "self × 2^(bias_semitones/12)" Replace pitch tier → Resynthesize For STEREO segments: Process each channel separately with same bias Recombine to stereo EFFECT PROGRESSION: • Early segments: Minimal pitch shift (± small range) • Middle segments: Moderate shift (± medium range) • Late segments: Maximum shift (± full range) Creates sense of memory decay/distortion over time

Microscope Time Stretching

TIME STRETCHING METHODS: IF preserve_pitch = 1: # Using Manipulation (PSOLA) for pitch-preserving stretch For MONO: manip = To Manipulation: 0.01, 75, 600 dur_tier = Extract duration tier Add point at middle: duration_factor Replace duration tier result = Get resynthesis (overlap-add) For STEREO: Process each channel separately Combine results QUALITY: High, preserves formants and natural character SPEED: Slower (manipulation objects created/destroyed) IF preserve_pitch = 0: # Simple PSOLA time scaling result = To Sound (PSOLA): 75, 600, 1/factor, 1.0 QUALITY: Lower, pitch changes with time scaling SPEED: Faster (direct PSOLA) TIME FACTOR INTERPRETATION: • 1.0: No change • 2.0: Twice as long (half speed) • 0.5: Half as long (double speed) • 3.0+: Extreme slowing, reveals micro-details

Memory and Performance Considerations

FactorLow ImpactHigh ImpactOptimization
Number of points5-1550+Process in batches, combine later
Window length0.1-0.5s2.0s+Use shorter windows for many points
Style complexityPure peepholeUnreliable narrator (stereo)Use simpler styles for many points
Original duration1-5 minutes30+ minutesExtract relevant portion first
Sample rate44100 Hz96000 HzResample to 44100 Hz if possible

Creative Applications

Music Production and Sampling

🎵 Beat and Phrase Extraction

Goal: Create sample libraries or rhythmic montages from recordings

Workflow:

  1. Load drum break or instrumental recording
  2. Mark each hit or interesting moment
  3. Use Pure peephole style with symmetric windows
  4. Export montage as sample source
  5. Import to DAW for further arrangement

Example: Extract all kick drums from a breakbeat for drum programming

Film and Video Sound Design

🎬 Trailer and Montage Creation

Goal: Create dramatic sound collages for video sequences

Workflow:

  1. Load dialogue or sound effects from film
  2. Mark emotional peaks and key phrases
  3. Use Context ramp style for cinematic feel
  4. Adjust asymmetric windows to capture context
  5. Sync montage with video edit

Example: Create 30-second trailer audio from 2-hour film dialogue

Oral History and Documentary

📝 Interview Highlight Compilation

Goal: Extract key moments from long interviews

Workflow:

  1. Load interview recording (30-60 minutes)
  2. Listen and mark important statements
  3. Use asymmetric windows (pre-heavy) for natural flow
  4. Create montage of highlights
  5. Export for editing or presentation

Example: Create 5-minute "best of" from 1-hour oral history interview

Sound Art and Experimental Music

🎨 Conceptual Audio Pieces

Goal: Create abstract compositions from found sounds

Workflow:

  1. Load field recording or environmental sound
  2. Mark interesting textures and moments
  3. Use Unreliable narrator for distorted memory effect
  4. Or use Microscope for detailed examination
  5. Layer multiple montages for complexity

Example: Create dream-like soundscape from city recordings

Educational Applications

Teaching Audio Concepts:
  • Editing principles: Demonstrate jump-cuts, montage, compression
  • Time perception: Show how context affects moment perception
  • Creative process: Illustrate selection → extraction → transformation workflow
  • Audio analysis: Use Microscope style to examine sound details
  • Memory and perception: Use Unreliable narrator to explore auditory memory

Advanced Creative Techniques

Multi-Layer Montages:
  1. Create multiple montages from same source with different point sets/styles
  2. Layer them in DAW with different panning/effects
  3. Creates rich, complex collages
  4. Example: Layer 1 (vocal highlights), Layer 2 (musical moments), Layer 3 (textures)
Temporal Remapping:
  • Create montage, then use as source for another montage pass
  • Creates "montage of montages" with higher-level structure
  • Example: First pass extracts phrases, second pass extracts moments from those phrases

Troubleshooting Common Issues

Problem: No points marked when clicking Continue
Causes: Forgot to press Ctrl-P/Cmd-P, editor navigation issues
Solutions: Ensure blue lines appear when marking, use keyboard shortcut not mouse click
Problem: Clicks/pops in montage
Causes: Fade too short or disabled, abrupt waveform discontinuities
Solutions: Use Cosine fade (0.01-0.02s), ensure segments start/end near zero crossings
Problem: Processing very slow with many points
Causes: Many points + complex style + long windows
Solutions: Reduce point count, use simpler style, shorten windows, process in batches
Problem: Montage sounds disjointed/awkward
Causes: Poor point selection, wrong window sizes, no contextual flow
Solutions: Choose points more carefully, use Context ramp style, adjust asymmetric windows

Best Practices for Different Audio Types

Audio TypeWindow StrategyRecommended StyleFade SettingsPoint Selection
Speech/DialogueAsymmetric (pre-heavy)Context rampCosine 0.01sSentence starts, emotional peaks
Music (rhythmic)SymmetricPure peepholeCosine 0.005sBeat positions, phrase starts
Music (ambient)Symmetric or longUnreliable narratorCosine 0.02sTexture changes, harmonic moments
Field recordingsVariableMicroscopeCosine 0.01sInteresting events, texture details
Interview/lectureAsymmetric (context)Pure peepholeCosine 0.01sKey points, summary statements