22.2 Synthetic Stem Renderer — v1.0 User Guide

Psychoacoustic approximation of a 22.2‑style synthetic stem renderer. Takes a mono or stereo input and produces a 24‑channel synthetic surround array (Middle, Upper, Lower layers + LFE) and an optional headphone output (clean fold‑down or true binaural using CNMAT KEMAR HRIRs).

Author: Shai Cohen Affiliation: Department of Music, Bar‑Ilan University, Israel Version: 1.0 (2025) License: MIT License Repo: GitHub
Contents:

What this does

22.2 Synthetic Stem Renderer upmixes a mono or stereo source into a 24‑channel surround array inspired by the 22.2 multichannel format (NHK / cinema). The engine creates a psychoacoustic approximation using subtractive side extraction, band‑limited delays, and matrix‑derived stems – no external plug‑ins or binaries required.

Key features:
  • 24‑channel output – Middle Layer (10 ch), Upper Layer (9 ch), Lower Layer (3 ch), LFE (2 ch).
  • Three presets – Cinematic Film (aggressive), Subtle Music (gentle), Wide Mono (optimised for mono sources).
  • Two headphone modes – Clean fold‑down (transparent) or True Binaural (CNMAT KEMAR HRIRs).
  • Stereo input: front L/R preserved; surround stems derived from side signals (L‑R, R‑L).
  • Mono input: real ITD widening (L = original, R = delayed by front_width_ms).
  • Surround and height channels are band‑limited, delayed, and gain‑scaled for diffuse placement.
  • LFE channels are low‑passed mid signal.

Quick start

  1. In Praat, select exactly one Sound object (mono or stereo).
  2. Run script…22.2_Synthetic_Stem_Renderer.praat.
  3. Choose a Preset:
    • Cinematic Film (aggressive surround), Subtle Music (gentle), Wide Mono (optimised for mono).
  4. For custom mode (preset = Custom), adjust Front_width_ms (mono widening) and other parameters via the form.
  5. Select Render_22_2_output and/or Render_headphone_output.
  6. If using headphone output, choose Headphone_mode:
    • Headphone Fold‑down (Clean) – transparent FL/FR + a touch of FC.
    • True Binaural (CNMAT KEMAR) – requires HRIR folder with the specific CNMAT KEMAR files.
  7. Click OK. The script builds 24 channels, optionally renders the headphone mix, and creates Sound objects named originalname_22_2_Array and originalname_Headphone_Preview or originalname_True_Binaural.
Tip: For stereo material, Cinematic Film gives a wide, immersive surround field. For mono recordings, Wide Mono creates a convincing stereo front stage plus surround ambience.
Important: This effect is implemented entirely in Praat – no Python required. The 24‑channel output can be saved as a multichannel WAV and imported into a DAW or multichannel player. The binaural mode requires the CNMAT KEMAR HRIR files to be present in the specified folder; otherwise it aborts cleanly.

The 3 presets (+ Custom)

PresetFC reinforceWide gainAmb gainUpper gainLower gainSide delayDescription
Subtle Music0.30-10 dB-8 dB-12 dB-18 dB25 ms
Wide Mono0.40-9 dB-6 dB-9 dB-12 dB35 ms
Cinematic Film0.50-6 dB-4 dB-6 dB-9 dB40 ms

Additional fixed parameters: decorr_ms = 7.0, hp_rear = 120 Hz, lp_rear = 6000 Hz, hp_top = 250 Hz, lp_top = 4000 Hz, lfe_hz = 120 Hz.

24‑channel layout

Middle Layer (10 channels)
1. FL – Front Left
2. FR – Front Right
3. FC – Front Center
4. FWL – Front Wide Left
5. FWR – Front Wide Right
6. SiL – Side Left
7. SiR – Side Right
8. BL – Back Left
9. BR – Back Right
10. BC – Back Center

Upper Layer (9 channels)
11. TpFL – Top Front Left
12. TpFR – Top Front Right
13. TpFC – Top Front Center
14. TpSiL – Top Side Left
15. TpSiR – Top Side Right
16. TpBL – Top Back Left
17. TpBR – Top Back Right
18. TpBC – Top Back Center
19. TpC – Top Center

Lower Layer (3 channels)
20. BFL – Bottom Front Left
21. BFR – Bottom Front Right
22. BFC – Bottom Front Center

LFE (2 channels)
23. LFE1
24. LFE2

Processing pipeline

Base stems (from input):
  • FL, FR – original (or mono→ITD widened).
  • FC – Mid (L+R) × c_reinforce.
  • Amb_L, Amb_R – side signals: L − R×0.5 and R − L×0.5.
  • FW_L, FW_R – blend of FL/FR and Amb (0.6 each).
  • Mid – (L+R)/2 (used for BC, TpBC, TpC, LFE).
Channel derivation:
  • Each channel is built from a base stem, optionally filtered (HP/LP), delayed, and gain‑scaled.
  • Surround/height channels use band‑limiting (HP 120 Hz / 250 Hz, LP 6000 Hz / 4000 Hz) to reduce directness.
  • LFE channels are low‑passed at 120 Hz (no delay).
  • All channels are peak‑normalised to the specified headroom.

The decorrelation offset (decorr_ms = 7 ms) is applied symmetrically to many pairs (e.g., FWL/FWR, SiL/SiR, TpFL/TpFR) to create a natural stereo width in the surround field.

Parameters & defaults

Preset

Select one of the three presets to load pre‑configured gains and delays. For custom mode (preset = Custom), the script uses the fixed internal parameters (as in Cinematic Film) but allows manual adjustment of the form fields.

Front widening (mono input only)

ParameterRangeDefaultDescription
Front_width_ms0–2 ms0.35 ms

Output options

ParameterOptionsDefaultDescription
Render_22_2_outputyes/noyes
Render_headphone_outputyes/noyes

Headphone mix

ParameterOptionsDefaultDescription
Headphone_modeFold‑down / True BinauralFold‑down
Hrir_folderfolder pathKemar_HRIR/

Master

ParameterRangeDefaultDescription Headroom_dBFS。”-1.0。 Play_headphone_resultyes/noyes。

Headphone modes

🎧 Headphone Fold‑down (Clean)

A deliberately EQ‑transparent reference consisting of:

  • FL + FR (full gain).
  • FC at 0.10 gain (subtle centre reinforcement).
  • No delayed, filtered, or subtractive surround feeds – preserves original timbre.

Use this mode for a clean, neutral headphone preview of the front image.

🎧 True Binaural (CNMAT KEMAR)

Experimental binaural render using the CNMAT KEMAR HRIR dataset. Only speaker positions with verified HRIR pairs are used:

  • FL, FR, FC, SiL, SiR, BL, BR, TpFL, TpFR, TpSiL, TpSiR, TpBL, TpBR, TpC.
  • Each stem is convolved with the corresponding left/right HRIR, then summed into the binaural ear signals.
  • Gain scaling per role: FC (0.12), SiL/SiR (0.10), BL/BR (0.08), TpFL/TpFR (0.06), TpSiL/TpSiR (0.05), TpBL/TpBR (0.04), TpC (0.03).

If any required HRIR file is missing, the binaural render aborts cleanly rather than producing a corrupted output. The required files are listed in the script (see source).

FAQ / troubleshooting

Output is silent / 24‑channel array has no sound

Check that Render_22_2_output is enabled. Also ensure that the source sound has reasonable amplitude – the upmixer preserves levels but if the input is very quiet, the output will also be quiet. The script normalises to headroom_dBFS.

True binaural mode aborts with “missing HRIR pair(s)”

Download the CNMAT KEMAR HRIR set (available from CNMAT or via the AudioTools distribution). Place the required WAV files in the folder specified in Hrir_folder. The required filenames are hardcoded in the script – ensure they match exactly.

Mono input still sounds mono in the front

The script applies an ITD delay to the right channel (front_width_ms). For a convincing Haas effect, values between 0.2 ms and 1 ms work well. The Wide Mono preset uses 0.35 ms. Increase this value for a wider image, but be aware that delays >1 ms may become perceptible as an echo.

Why 24 channels?

The 22.2 format (22 full‑range channels + 2 LFE) is a standard for immersive audio (NHK, cinema). This synthetic renderer is a psychoacoustic approximation – it does not claim physical accuracy, but it provides a practical way to upmix any source into a multichannel array for experimentation or playback on 24‑channel systems.

Decorrelation offset

Many channel pairs (e.g., FWL/FWR, SiL/SiR, TpFL/TpFR) receive a small extra delay offset (decorr_ms = 7 ms). This creates a natural stereo width in the surround field without phase cancellation. The offset is symmetric: left channel = base delay, right channel = base delay + decorr_ms.

Band‑limiting of surround/height channels

Surround and height channels are high‑passed (120 Hz / 250 Hz) to remove low‑frequency buildup, and low‑passed (6000 Hz / 4000 Hz) to reduce directness and prevent tonal clutter. This makes them sound diffuse and “behind” the listener, while the front channels retain full frequency response.