Photo Sonification — User Guide

Praat script for sound synthesis from an image.

Author: Shai Cohen Affiliation: Department of Music, Bar-Ilan University, Israel Version: 0.1 (2025) License: MIT License Repo: https://github.com/ShaiCohen-ops/Praat-plugin_AudioTools
Contents:

What this does

This Praat script is a generative algorithm for image sonification.

It creates a stereo noise texture whose characteristics evolve over time, based on the column-averaged RGB values of a selected Photo object. The image's columns are mapped onto the duration of the output sound.

The mapping is as follows:

The process can have a long runtime on images with many columns or if a long duration/high sampling frequency is set.

Technical details: This Praat script generates colorized noise textures from photographic imagery by decomposing images into RGB channels, extracting per-column color statistics, and using these values to modulate filtered noise bands across time and stereo space. The architecture represents a sophisticated multi-stage signal processing pipeline that preserves timbral fidelity while mapping visual data to acoustic parameters. The script begins by extracting red, green, and blue channel matrices from a selected Photo object, then computes global minimum and maximum values across all three channels to establish a unified normalization range. For each vertical column in the image, it calculates the average pixel value across all rows within that column for each color channel independently, then normalizes these averages against the global range to produce three separate normalized envelopes spanning 0 to 1. These column-wise color statistics are precomputed and stored in matrix arrays, avoiding repetitive database lookups during synthesis. The sound synthesis employs frequency-band separation where three independent noise sources are generated from a single random noise base, then filtered into non-overlapping frequency regions: low frequencies (100–800 Hz) associated with red channel control, mid frequencies (800–3000 Hz) associated with green channel control, and high frequencies (3000–9000 Hz) associated with blue channel control. Each filtered noise band is modulated by its corresponding color envelope, effectively mapping color intensity directly to spectral energy distribution. The overall amplitude envelope derives from the arithmetic mean of all three normalized color values, controlling the perceived loudness as an average of color information. Stereo panning implements constant-power crossfading using square-root gain compensation to maintain uniform perceived loudness across the stereo field. The pan position is calculated as 0.5 plus 0.5 times the difference between red and blue normalized values, creating smooth left-right movement corresponding to color gradients. Gain envelopes for the left and right channels are derived from the pan position using square-root transformations (gain = √pan for right, gain = √(1-pan) for left), ensuring that energy remains constant regardless of pan position. The final stereo signal combines modulated and amplitude-controlled noise with spatially distributed panning, then peak-normalizes to 0.99 to maximize dynamic range while preventing digital clipping. Intermediate processing objects are systematically removed post-synthesis to optimize workspace efficiency.

Quick start

  1. In Praat, select a Photo object (you may need to Read an image file first).
  2. Run script…Photo _sonification.praat.
  3. Adjust parameters (especially duration and frequency bands) and click OK.
  4. The output object imageSonification appears and plays automatically.

Parameters (form fields)

Name (GUI)TypeDefaultDescription
duration(seconds)real3.0The length of the resulting audio file.
fs(Hz)integer44100The sampling frequency of the output sound.
Low-frequency band (Red channel)
lowF1(Hz)integer100Lower frequency limit for the Red channel noise band.
lowF2(Hz)integer800Upper frequency limit for the Red channel noise band.
Mid-frequency band (Green channel)
midF1(Hz)integer800Lower frequency limit for the Green channel noise band.
midF2(Hz)integer3000Upper frequency limit for the Green channel noise band.
High-frequency band (Blue channel)
highF1(Hz)integer3000Lower frequency limit for the Blue channel noise band.
highF2(Hz)integer9000Upper frequency limit for the Blue channel noise band.

Outputs