MotionControl — Gesture-to-Sound Transformation
Offline motion-controlled sound transformation: webcam captures hand movement, extracts three control streams (energy, vertical position, horizontal position), and applies parallel amplitude, pitch, and spectral brightness modulations.
What this does
This script implements motion-controlled sound transformation — a pipeline that uses a webcam to capture free-hand gestures and maps them to three parallel audio modulations. A Python worker opens the camera, records 10 seconds of motion (after 2 seconds of background calibration), and extracts three normalized control channels via frame differencing and motion-weighted centroid tracking. Praat then applies three offline transformations to the selected Sound:
- Motion energy → amplitude envelope (AmplitudeTier multiplication)
- Vertical hand position → pitch contour (Manipulation + PitchTier)
- Horizontal hand position → spectral brightness (time-varying HPF modulation)
The pipeline is entirely file-based (CSV + markers), no real-time streaming between Praat and Python, guaranteeing reproducibility and offline processing.
Technical Implementation: (1) Python worker: Opens webcam, captures CAL_SEC (2s) for background modelling, then CAPTURE_SEC (10s) of free motion. Extracts per-frame motion energy and motion-weighted centroid (X,Y). (2) Control smoothing: EMA filter, percentile stretch, deadband, hysteresis. (3) CSV export: time, motion_energy, vertical_pos, horizontal_pos (all 0..1). (4) Amplitude mapping: energy -> AmplitudeTier with user-defined min/max, multiply sound. (5) Pitch mapping: vertical pos -> semitone shift (-range..+range), original pitch extracted via Manipulation, shifted PitchTier, resynthesis. (6) Brightness mapping: horizontal pos -> high-pass filtered copy added/subtracted (right = brighter, left = darker). Key insight: three independent gesture channels produce rich, multidimensional sound transformation with no real-time constraints.
Quick start
- In Praat, select exactly one Sound object.
- Run script… →
MotionControl.praat. - Choose a preset (Subtle gesture, Expressive performer, Wild motion, Meditative) or select "Custom".
- Adjust parameters: pitch range (semitones), amplitude min/max, brightness range, smoothing frames, control fps.
- A webcam preview window opens. Hold still for 2 seconds (calibration), then move freely for 10 seconds.
- After capture, the script applies three transformations automatically. Output named
originalname_motion.
numpy and opencv-python — install via pip install numpy opencv-python.
Control mappings
1. Motion energy → Amplitude envelope
2. Vertical position → Pitch contour
3. Horizontal position → Spectral brightness (HPF modulation)
Motion-weighted centroid
For each video frame, after background subtraction and frame differencing:
Energy = mean(diff)
Vertical centroid = Σ(diff × y) / Σ(diff) (inverted: top=1, bottom=0)
Horizontal centroid = Σ(diff × x) / Σ(diff) (left=0, right=1)
If total motion is below threshold, positions snap to neutral (0.5). This avoids jitter when still.
Parameters & Presets
Common Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
| Preset | optionmenu | Expressive performer | Subtle, Expressive, Wild, Meditative, or Custom |
| Pitch_range_st | real | 6.0 | Semitone shift range (±) |
| Amplitude_min | real | 0.20 | Minimum amplitude (energy=0) |
| Amplitude_max | real | 1.00 | Maximum amplitude (energy=1) |
| Brightness_range | real | 0.80 | Max HPF gain at extremes |
| Smooth_frames | integer | 5 | EMA window for control smoothing |
| Control_fps | integer | 25 | Output control rate (frames/sec) |
| Draw_visualization | boolean | yes | Generate Praat picture with curves |
| Play_result | boolean | yes | Auto-play after processing |
Built-in presets
| Preset | Pitch range (st) | Amp min/max | Brightness range | Smooth frames | Character |
|---|---|---|---|---|---|
| Subtle gesture | 3.0 | 0.50–1.00 | 0.40 | 9 | Gentle, refined movements, narrow pitch range |
| Expressive performer | 6.0 | 0.20–1.00 | 0.80 | 5 | Balanced, musical — recommended default |
| Wild motion | 12.0 | 0.08–1.00 | 1.20 | 3 | Dramatic, snappy, extreme modulation |
| Meditative | 2.0 | 0.55–0.90 | 0.30 | 18 | Slow inertia, narrow range — drones & sustained tones |
Capture & processing pipeline
Applications
Expressive performance capture
Use case: Record a gestural performance and apply it to any audio source — transform a simple pad into an expressive lead, or add human nuance to synthesized textures.
Technique: Use "Expressive performer" preset. Move your whole arm; vertical gestures affect pitch, horizontal gestures affect brightness, energy controls dynamics.
Experimental composition & sound design
Use case: Create evolving, gesture-driven soundscapes where amplitude, pitch, and timbre are coupled to physical motion.
Technique: "Wild motion" preset with wide pitch range and high brightness range. Record chaotic gestures, then apply to granular textures or field recordings.
Psychoacoustics research
Use case: Reproducible gesture-to-sound mappings for perception studies.
Advantages: Exact specification via preset parameters, identical across trials, documented transformation chain, no subjective variability.
Pedagogical tool
Use case: Demonstrate embodied music interaction and sensorimotor mapping in classrooms.
Learning outcomes: Understand relationship between gesture and sound parameters, explore real-time feature extraction, learn about offline audio processing pipelines.
Practical workflow example: Swell + pitch rise + brightening
Gesture: Start with hand low and left, move diagonally upward to the right while increasing gesture speed.
Resulting transformation: Amplitude swells (energy increase), pitch rises (vertical up), brightness increases (horizontal right). Creates a dramatic "sweep" effect.
Settings: Wild motion preset or custom: pitch_range=12, brightness_range=1.2, amplitude_min=0.1, amplitude_max=1.0.
• Python not found or missing packages: Install numpy and opencv-python:
pip install numpy opencv-python. Verify with python -c "import numpy, cv2".• Webcam preview black / no motion detected: Improve lighting, avoid moving background, wear contrasting clothing. Calibration requires 2 seconds of stillness.
• Low tracking confidence warning (<30%): Increase gesture amplitude, move closer to camera, or reduce distance to background.
• Output clipping: The script auto-scales peak to 0.97. If still clipping, reduce brightness_range or amplitude_max.
• Unexpected pitch changes: Original sound must have clear pitch content (voiced sounds, tonal instruments). For noisy sounds, consider using a different source.