Peephole Montage — User Guide

Interactive audio collage creation: manually mark moments of interest in an audio file, then automatically extract and combine them into a jump-cut montage with artistic variations and cinematic effects.

Author: Shai Cohen Version: 2.0 (Interactive) License: MIT License Category: Audio Editing & Creative Montage

Contents:

What this does Quick start Interactive Workflow Montage Styles Technical Details Creative Applications

What this does

This script implements interactive audio montage creation — a unique workflow that combines manual selection with automated processing to create jump-cut collages from any audio file. The process: (1) Interactive Marking: Opens a PointProcess editor for you to manually mark moments of interest (peepholes) in the audio. (2) Window Extraction: Automatically extracts symmetric or asymmetric windows around each marked point. (3) Artistic Processing: Applies one of four creative styles to transform the raw extracts. (4) Seamless Assembly: Concatenates processed windows with crossfading to create a continuous montage. (5) Statistical Analysis: Reports compression ratio and timing information about the created montage.

Key Features:

Interactive Point Marking — Click-to-mark interface for precise moment selection
4 Artistic Styles — Pure peephole, Context ramp, Unreliable narrator, Microscope
Flexible Window Control — Symmetric or asymmetric window extraction
Professional Crossfading — 4 fade types to prevent clicks and pops
Intelligent Context Analysis — Adaptive window sizing based on audio content
Creative Mutations — Pitch shifting, stereo flipping, time stretching
Real-time Feedback — Console output with timing and compression statistics
Batch Processing — Process multiple marked points in one operation

What is a peephole montage? Traditional audio editing: Manual cutting and pasting of segments. Peephole montage: Selective extraction and creative reassembly of moments. Advantages: (1) Intuitive workflow: Mark what sounds interesting, let script handle extraction. (2) Creative discovery: Find unexpected connections between disparate moments. (3) Time compression: Create highlight reels from long recordings. (4) Artistic transformation: Apply creative processing to extracted moments. (5) Repeatable: Same points create same montage, but with creative variations possible. Use cases: Music production (creating loops from recordings), film editing (sound collage creation), oral history (highlight compilation), sound design (moment extraction), education (teaching audio editing concepts).

Technical Implementation: The script creates a hybrid workflow: (1) PointProcess Creation: Generates empty point process synchronized with source audio timeline. (2) Editor Interface: Opens Praat's built-in editor for manual point placement (Ctrl-P/Cmd-P). (3) Window Extraction: For each marked point, extracts audio segment using either symmetric (centered) or asymmetric (custom pre/post) windows. (4) Crossfade Application: Applies selected fade type (none, linear, cosine, Hamming) to prevent clicks. (5) Style Processing: Applies chosen artistic transformation to each segment. (6) Concatenation: Joins all processed segments into final montage. Key insight: The human selects WHAT moments are interesting; the script handles HOW to extract and combine them creatively.

Quick start

In Praat, select exactly one Sound object (any audio file).
Run script… → peephole_montage.praat.
A form appears with parameters — accept defaults or adjust as needed.
Click OK — a PointProcess editor opens with your audio.
Navigate through audio using standard Praat editor controls.
At interesting moments, press Ctrl-P (Windows/Linux) or Cmd-P (Mac) to add a point.
Add as many points as desired (minimum 1).
Click Continue button in editor — script extracts and processes marked moments.
Watch console for processing progress and statistics.
Final montage named "peephole_montage" appears in Objects window and plays automatically.

Quick tip: Start with "Pure peephole" style and symmetric windows (0.5s) for straightforward results. Use Ctrl-P/Cmd-P precisely at interesting moments — these become the center points of extracted windows. The editor shows blue vertical lines at marked points — you can add, move, or delete points. For music, mark beat drops, interesting riffs, or vocal phrases. For speech, mark important words, emotional moments, or pauses. Use asymmetric windows (pre=0.3s, post=0.2s) for speech to capture context before important words. Cosine fade (0.01s) works best for most material. The output montage plays automatically — listen for how the marked moments connect.

Important: ONE SOUND ONLY — Script fails with 0 or 2+ sounds selected. Point marking is manual — You must actively listen and mark points; script doesn't auto-detect "interesting" moments. Editor expertise needed — Basic Praat editor navigation skills required. Points are time positions — Not regions/segments; windows are extracted around points. Processing time depends on number of points and style complexity — 20+ points with Unreliable narrator may take minutes. Original audio unchanged — Only new montage is created. Memory usage increases with many points/long windows — may be slow with 50+ points. Editor must be closed by clicking Continue — don't close window manually.

Interactive Workflow

🔄 Three-Phase Montage Creation

Phase 1: Setup & Parameter Selection — Choose extraction and artistic parameters

Phase 2: Interactive Point Marking — Listen and mark moments in editor

Phase 3: Automated Processing & Assembly — Script extracts, processes, combines

Phase 1: Parameter Selection

Parameter Group	Key Parameters	Default Values	Purpose
Window extraction	Window_length, Asymmetric_windows, Pre_length, Post_length	0.5s, off, 0.3s, 0.2s	Control what gets extracted around each point
Anti-click processing	Fade_type, Fade_duration	Cosine, 0.01s	Prevent clicks at segment boundaries
Artistic variations	Montage_style	Pure peephole	Creative transformation of extracted segments
Style-specific options	Various (depends on style)	Style-dependent	Fine-tune artistic effects
Output	Output_name	peephole_montage	Name of final montage object

Phase 2: Interactive Point Marking

EDITOR WORKFLOW: 1. Editor opens showing sound waveform 2. Navigation controls (standard Praat): • Play/Stop: Spacebar or buttons • Zoom: Click-drag in timeline • Scroll: Horizontal scrollbar • Selection: Click-drag in waveform 3. Point marking: • Position cursor at interesting moment • Press Ctrl-P (Windows/Linux) or Cmd-P (Mac) • Blue vertical line appears at cursor position • Repeat for additional points 4. Point management: • Move point: Drag blue line • Delete point: Select point (click near it), press Delete • Multiple points: Can be anywhere in timeline • Minimum: 1 point required 5. Completion: • Click "Continue" button in editor • Script proceeds to extraction phase • Editor closes automatically TIPS FOR EFFECTIVE MARKING: • Mark musical beats for rhythmic montages • Mark sentence beginnings for speech highlights • Mark emotional peaks for dramatic montages • Vary density: cluster points for intensity, space for pacing

Phase 3: Automated Processing

PROCESSING STEPS: For each marked point i (1 to n_points): 1. Calculate window boundaries: IF Asymmetric_windows = 0: start = t_i - Window_length/2 end = t_i + Window_length/2 ELSE: start = t_i - Pre_length end = t_i + Post_length 2. Extract segment: segment = Extract part: start, end, "rectangular", 1, "no" 3. Apply fade (if selected): Cosine: self × (1 - cos(π × x/fade_duration))/2 on edges Linear: linear ramp up/down Hamming: self × (0.54 - 0.46 × cos(π × x/fade_duration)) on edges 4. Apply artistic style (if not Pure peephole): • Context ramp: Adaptive window sizing • Unreliable narrator: Mutations • Microscope: Time stretching 5. Store processed segment CONCATENATION: Select all processed segments Concatenate (Praat's Concatenate command) Rename to Output_name STATISTICS CALCULATION: Compression ratio = Original duration / Montage duration Reported in console

Window Extraction Strategies

Strategy	Window Type	Typical Settings	Best For	Example Use
Symmetric	Centered on point	Length=0.5s	Musical moments, isolated sounds	Drum hits, vocal syllables
Asymmetric (pre-heavy)	More before point	Pre=0.4s, Post=0.1s	Speech, anticipation	Word beginnings, phrase starts
Asymmetric (post-heavy)	More after point	Pre=0.1s, Post=0.4s	Resonance, decay	Guitar notes, reverb tails
Custom asymmetric	Context-dependent	Pre=0.3s, Post=0.2s	Natural speech flow	Conversation highlights

Fade Types Comparison

FADE FORMULAS (applied to first/last fade_duration seconds): 1. NONE (no fade): No processing → potential clicks at boundaries 2. LINEAR: First fade_duration: self × (x / fade_duration) Last fade_duration: self × (1 - (x - (dur-fade_duration))/fade_duration) Simple but may cause slight distortion 3. COSINE (recommended): First: self × (1 - cos(π × x/fade_duration))/2 Last: self × (1 - cos(π × (dur - x)/fade_duration))/2 Smooth, natural sounding 4. HAMMING: First: self × (0.54 - 0.46 × cos(π × x/fade_duration)) Last: self × (0.54 - 0.46 × cos(π × (dur - x)/fade_duration)) Very smooth, slight spectral modification FADE DURATION GUIDELINES: • 0.005s: Very quick, good for sharp cuts • 0.01s: Default, works for most material • 0.02s: Smooth, good for musical material • 0.05s: Very smooth, audible as fade

Montage Styles

🎨 4 Creative Processing Styles

Style 1: Pure peephole — Baseline, unprocessed jump-cuts

Style 2: Context ramp — Cinematic, adaptive window sizing

Style 3: Unreliable narrator — Subtle mutations and distortions

Style 4: Microscope — Time-stretched examination of moments

Style 1: Pure Peephole (Baseline)

🔍 Straightforward Jump-Cut Montage

Character: Clean, direct, unprocessed extracts

Processing: Only window extraction + crossfade

Use when: You want pure selection without artistic transformation

Best for: Documentary work, speech highlights, musical phrase extraction

Example: Creating a "greatest hits" from a lecture or concert

Style 2: Context Ramp (Cinematic)

🎬 Adaptive Window Sizing

Character: Cinematic, context-aware, dynamic

Processing: Analyzes local intensity and pause patterns to adjust window sizes

Adaptive logic:

Longer pre-window if following a pause (builds anticipation)
Shorter post-window if high intensity (cuts quickly after climax)
Standard windows for normal contexts

Best for: Film/TV sound design, dramatic storytelling, emotional highlights

Example: Creating trailer-like montage from a film dialogue

Style 3: Unreliable Narrator

🎭 Subtle Mutations and Distortions

Character: Memory-like, distorted, dreamy

Processing: Applies progressive mutations to create "faulty memory" effect

Mutations include:

Random stereo flip: Occasionally swaps left/right channels (stereo only)
Progressive pitch bias: Later segments more pitch-shifted (simulates memory decay)
Rule-based variations: Different processing for different segment indices

Parameters:

UN_random_stereo_flip: Enable/disable channel swapping
UN_pitch_bias_range: Maximum semitone shift for later segments

Best for: Experimental music, dream sequences, memory-themed pieces

Example: Creating a "fading memory" montage from childhood recordings

Style 4: Microscope

🔬 Time-Stretched Examination

Character: Detailed, expanded, analytical

Processing: Time-stretches each segment to examine moment in detail

Time stretching methods:

With pitch preservation: Uses PSOLA manipulation (slower, higher quality)
Without pitch preservation: Simple PSOLA time scaling (faster, pitch changes)

Parameters:

Microscope_time_factor: Stretch factor (2.0 = twice as long)
Microscope_preserve_pitch: Keep original pitch during stretching

Best for: Sound analysis, granular synthesis, experimental time manipulation

Example: Stretching vocal moments to hear subtle formant transitions

Style Selection Guide

Project Goal	Recommended Style	Key Settings	Expected Result
Clean highlights compilation	Pure peephole	Window=0.5s, Cosine fade	Direct, unprocessed jump-cuts
Dramatic film trailer	Context ramp	Asymmetric windows, Adaptive	Cinematic, context-aware montage
Dream sequence/sound art	Unreliable narrator	Pitch bias=1.0, Stereo flip on	Distorted, memory-like collage
Analytical/slow listening	Microscope	Time factor=3.0, Preserve pitch	Expanded, detailed examination
Musical phrase samplingPure peephole or Context ramp	Symmetric windows, Short fades	Rhythmic, musical montage

Advanced Style Combinations

Sequential Processing:

Create montage with Style A
Use montage as input for second pass with Style B
Creates layered artistic effects
Example: Pure peephole → Unreliable narrator (extract then distort)

Multi-Style Montage:

Process different point subsets with different styles
Combine results manually in Praat or DAW
Creates piece with varying texture and processing
Example: Speech points → Pure, musical points → Microscope

Technical Details

Window Extraction Algorithm

FOR EACH MARKED POINT t_i: If Asymmetric_windows = 0: half = Window_length / 2 start = t_i - half end = t_i + half If Asymmetric_windows = 1: start = t_i - Pre_length end = t_i + Post_length BOUNDARY CLAMPING: if start < sound_start: start = sound_start if end > sound_end: end = sound_end EXTRACTION: segment = Extract part: start, end, "rectangular", 1, "no" SPECIAL CASES: • If window extends beyond sound boundaries: clipped to available audio • Very short windows (< 0.01s): May cause processing issues • Overlapping windows (if points close): Segments will overlap in montage

Context Ramp Analysis

CONTEXT ANALYSIS FOR EACH POINT: 1. Extract local window (2×Window_length before, 0.5× after) 2. Calculate intensity: intensity = To Intensity: 100, 0, "yes" mean_dB = Get mean: 0, 0, "dB" normalized = (mean_dB - 40) / 40 # Assuming 40-80 dB typical range clamped to 0-1 3. Detect pause before: pre_window = Extract part: t - 2×Window_length to t - 0.25×Window_length pre_intensity = To Intensity: 100, 0, "yes" pre_mean = Get mean: 0, 0, "dB" pause_detected = (pre_mean < mean_dB - 10 dB) ? 1 : 0 ADAPTIVE WINDOW SIZING: adaptive_pre = Pre_length × (1 + pause_detected × 0.5) adaptive_post = Post_length × (1.2 - normalized × 0.4) PHYSICAL INTERPRETATION: • After pause: Take more context (build anticipation) • High intensity: Cut quickly after climax • Normal context: Standard window sizes

Unreliable Narrator Mutations

MUTATION ALGORITHM: 1. STEREO FLIP (if enabled and segment is stereo): if randomInteger(1, 2) = 1: Extract channels: ch1, ch2 Combine to stereo: ch2 + ch1 (swapped) 2. PITCH BIAS (progressive): bias_factor = (segment_index / total_segments) × UN_pitch_bias_range bias_semitones = randomUniform(-bias_factor, bias_factor) For MONO segments: To Manipulation → Extract pitch tier Formula: "self × 2^(bias_semitones/12)" Replace pitch tier → Resynthesize For STEREO segments: Process each channel separately with same bias Recombine to stereo EFFECT PROGRESSION: • Early segments: Minimal pitch shift (± small range) • Middle segments: Moderate shift (± medium range) • Late segments: Maximum shift (± full range) Creates sense of memory decay/distortion over time

Microscope Time Stretching

TIME STRETCHING METHODS: IF preserve_pitch = 1: # Using Manipulation (PSOLA) for pitch-preserving stretch For MONO: manip = To Manipulation: 0.01, 75, 600 dur_tier = Extract duration tier Add point at middle: duration_factor Replace duration tier result = Get resynthesis (overlap-add) For STEREO: Process each channel separately Combine results QUALITY: High, preserves formants and natural character SPEED: Slower (manipulation objects created/destroyed) IF preserve_pitch = 0: # Simple PSOLA time scaling result = To Sound (PSOLA): 75, 600, 1/factor, 1.0 QUALITY: Lower, pitch changes with time scaling SPEED: Faster (direct PSOLA) TIME FACTOR INTERPRETATION: • 1.0: No change • 2.0: Twice as long (half speed) • 0.5: Half as long (double speed) • 3.0+: Extreme slowing, reveals micro-details

Memory and Performance Considerations

Factor	Low Impact	High Impact	Optimization
Number of points	5-15	50+	Process in batches, combine later
Window length	0.1-0.5s	2.0s+	Use shorter windows for many points
Style complexity	Pure peephole	Unreliable narrator (stereo)	Use simpler styles for many points
Original duration	1-5 minutes	30+ minutes	Extract relevant portion first
Sample rate	44100 Hz	96000 Hz	Resample to 44100 Hz if possible

Creative Applications

Music Production and Sampling

🎵 Beat and Phrase Extraction

Goal: Create sample libraries or rhythmic montages from recordings

Workflow:

Load drum break or instrumental recording
Mark each hit or interesting moment
Use Pure peephole style with symmetric windows
Export montage as sample source
Import to DAW for further arrangement

Example: Extract all kick drums from a breakbeat for drum programming

Film and Video Sound Design

🎬 Trailer and Montage Creation

Goal: Create dramatic sound collages for video sequences

Workflow:

Load dialogue or sound effects from film
Mark emotional peaks and key phrases
Use Context ramp style for cinematic feel
Adjust asymmetric windows to capture context
Sync montage with video edit

Example: Create 30-second trailer audio from 2-hour film dialogue

Oral History and Documentary

📝 Interview Highlight Compilation

Goal: Extract key moments from long interviews

Workflow:

Load interview recording (30-60 minutes)
Listen and mark important statements
Use asymmetric windows (pre-heavy) for natural flow
Create montage of highlights
Export for editing or presentation

Example: Create 5-minute "best of" from 1-hour oral history interview

Sound Art and Experimental Music

🎨 Conceptual Audio Pieces

Goal: Create abstract compositions from found sounds

Workflow:

Load field recording or environmental sound
Mark interesting textures and moments
Use Unreliable narrator for distorted memory effect
Or use Microscope for detailed examination
Layer multiple montages for complexity

Example: Create dream-like soundscape from city recordings

Educational Applications

Teaching Audio Concepts:

Editing principles: Demonstrate jump-cuts, montage, compression
Time perception: Show how context affects moment perception
Creative process: Illustrate selection → extraction → transformation workflow
Audio analysis: Use Microscope style to examine sound details
Memory and perception: Use Unreliable narrator to explore auditory memory

Advanced Creative Techniques

Multi-Layer Montages:

Create multiple montages from same source with different point sets/styles
Layer them in DAW with different panning/effects
Creates rich, complex collages
Example: Layer 1 (vocal highlights), Layer 2 (musical moments), Layer 3 (textures)

Temporal Remapping:

Create montage, then use as source for another montage pass
Creates "montage of montages" with higher-level structure
Example: First pass extracts phrases, second pass extracts moments from those phrases

Troubleshooting Common Issues

Problem: No points marked when clicking Continue
Causes: Forgot to press Ctrl-P/Cmd-P, editor navigation issues
Solutions: Ensure blue lines appear when marking, use keyboard shortcut not mouse click

Problem: Clicks/pops in montage
Causes: Fade too short or disabled, abrupt waveform discontinuities
Solutions: Use Cosine fade (0.01-0.02s), ensure segments start/end near zero crossings

Problem: Processing very slow with many points
Causes: Many points + complex style + long windows
Solutions: Reduce point count, use simpler style, shorten windows, process in batches

Problem: Montage sounds disjointed/awkward
Causes: Poor point selection, wrong window sizes, no contextual flow
Solutions: Choose points more carefully, use Context ramp style, adjust asymmetric windows

Best Practices for Different Audio Types

Audio Type	Window Strategy	Recommended Style	Fade Settings	Point Selection
Speech/Dialogue	Asymmetric (pre-heavy)	Context ramp	Cosine 0.01s	Sentence starts, emotional peaks
Music (rhythmic)	Symmetric	Pure peephole	Cosine 0.005s	Beat positions, phrase starts
Music (ambient)	Symmetric or long	Unreliable narrator	Cosine 0.02s	Texture changes, harmonic moments
Field recordings	Variable	Microscope	Cosine 0.01s	Interesting events, texture details
Interview/lecture	Asymmetric (context)	Pure peephole	Cosine 0.01s	Key points, summary statements