Spectral Image Sonification — User Guide

Creates evolving harmonic spectrum (vowel-like sound) based on image colors.

Author: Shai Cohen Affiliation: Department of Music, Bar-Ilan University, Israel Version: 0.1 (2025) License: MIT License Repo: https://github.com/ShaiCohen-ops/Praat-plugin_AudioTools
Contents:

What this does

This Praat script generates a continuous, evolving harmonic sound (a synthetic vowel-like or drone texture) from a selected image. The image's columns are mapped onto the duration of the output sound, causing the timbre to shift over time.

The sonification uses the following mapping to control the harmonic content:

This Praat script translates photographic imagery into complex harmonic spectra by decomposing images into RGB color channels and mapping normalized color intensities to individual harmonic amplitudes within a fixed fundamental frequency structure. The architecture extracts red, green, and blue channels as separate Matrix objects, enabling per-channel pixel analysis and establishing a normalized amplitude range across all three channels simultaneously. The normalization process computes global minimum and maximum values by iterating through all pixels in all three channels, establishing a unified range that scales all color intensities consistently. For each vertical column in the image, the script averages pixel values across all rows within that column for each color channel independently, then normalizes these averages against the global range to produce three separate amplitude envelopes (one per color channel) spanning 0 to 1. This column-wise decomposition converts spatial image data into temporal evolution, where each column represents a discrete time segment of the final audio. Harmonic synthesis implements a cyclical harmonic distribution where consecutive harmonics are assigned red, green, and blue control respectively through modulo arithmetic. Each harmonic's frequency is calculated as an integer multiple of the fundamental frequency (harmonic number × fundamental), while its amplitude derives from the corresponding color channel's normalized value, scaled by 1/harmonic to approximate natural spectral decay characteristics. The script conditionally generates spectral content only when total amplitude exceeds a 0.1 threshold, avoiding synthesis of silent or near-silent image regions. Stereo panning modulation derives from color balance by calculating the difference between normalized red and blue channel averages, scaled by 0.3, then offset to 0.5 to produce pan positions spanning approximately 0.2 to 0.8. The left channel receives amplitude (1 - panPos) applied to the complete harmonic spectrum, while the right channel receives panPos applied to identical harmonic content, creating frequency-coherent stereo movement corresponding to image color gradients. Peak normalization to 0.8 prevents digital clipping, and intermediate matrix objects are removed post-processing to optimize workspace efficiency.

Quick start

  1. In Praat, select a Photo object.
  2. Run script…Spectral Image Sonification.praat.
  3. Adjust parameters and click OK.
  4. The output object `image_spectral_sonification` appears and plays automatically.

Parameters (form fields)

Name (GUI)TypeDefaultDescription
duration(seconds)real5.0The length of the resulting audio file.
fs(Hz)integer44100The sampling frequency of the output sound.
fundamentalFreq(Hz)integer110The base frequency (F0) for the harmonic series.
maxHarmonicsinteger16The number of harmonics to include in the synthesized sound.

Outputs