Microtonal Harmonic Field Engine — v0.2 User Guide

Analysis‑driven microtonal harmonisation for existing audio. Detects stable (S), unstable (U), and noisy (N) zones from pitch, harmonicity, and intensity. Derives local harmonic anchors from the source itself, then generates companion voices via phrase‑granular resampling (playback‑rate pitch shifting). Commas and wolf intervals are tracked and used as compositional events — not corrected.

Author: Shai Cohen Affiliation: Department of Music, Bar‑Ilan University, Israel Version: 0.2 (2025) License: MIT License Repo: GitHub

Contents:

What it does Quick start Presets (7 archetypes) Analysis: zones & phrases Harmony generation Pitch behaviors Parameters Visualization FAQ / troubleshooting

What this does

Microtonal Harmonic Field Engine creates microtonal companion voices from a monophonic source. It does not impose a fixed tuning system on the audio. Instead, it analyses the source’s own pitch, harmonicity, and intensity to segment it into phrases and classify frames into three zones:

S (Stable) – reliable pitch, used as primary harmonic anchor.
U (Unstable) – pitched but less stable, used with lower weight.
N (Noisy/Unvoiced) – used as raw material, pitch‑shifted without anchor.

For each phrase, a harmonic anchor (in cents) is computed from S and U frames. Two tuning fields (A and B) provide microtonal intervals relative to the anchor. A field mix per phrase blends between the two fields, creating a gradual harmonic drift. An accumulated comma tracks deviations from perfect fifths/fourths, triggering wolf events when it exceeds a threshold — these become structural markers.

Resampling engine: Companion voices are generated by extracting the source phrase, resampling it (playback‑rate pitch shift) to achieve the target interval, and placing it back with crossfades. This preserves the full spectral character of all zone types, including noise and breath. No pitch‑tracking is used on the shifted material — it sounds exactly like the source, transposed.

Quick start

In Praat, select exactly one Sound object (mono or stereo).
Run script… → Microtonal_Harmonic_Field_Engine.praat.
Choose a Preset_mode:
- Just Bloom, Septimal Shadow, Wolf Corridor, Harmonic Gravity, Temperament Collapse, Dual Root Conflict, Comma Drift, Custom
For Custom, select Field_A and Field_B tuning systems (12‑TET, Just, Pythagorean, 19‑TET, 31‑TET, Harmonic Series, Septimal, User‑Defined).
Set N_voices (1–5), Companion_level, Original_level.
Choose Primary_behavior (Snap, Lean, Drift, Oscillate, Smear, Refuse).
Select Placement_mode (Align start, centre, or stretch‑to‑fit).
Click OK. Praat runs pitch/harmonicity/intensity analysis, segments phrases, computes anchors, generates voices via resampling, mixes with stereo spread, and imports result as originalname_MHFE_result.

Quick tip: Start with Just Bloom (Field A = Just, Field B = Harmonic Series, behavior = Lean) for a warm, natural microtonal chorus. For dramatic wolf‑interval effects, try Wolf Corridor (Pythagorean both fields, behavior = Drift).

Important: This script uses only Praat’s built‑in analysis and resampling – no Python required. It works on mono or stereo; stereo sources are mixed to mono for analysis but the final output is stereo (original centre, voices panned according to Stereo_spread). Very short sounds (<0.5 s) may produce unreliable analysis.

The seven presets (+ Custom)

Preset	Field A	Field B	Behavior	Voices	Description
Just Bloom	Just Intonation	Harmonic Series	Lean	≤4	Warm, natural chorus; gentle harmonic drift.
Septimal Shadow	Septimal	12‑TET	Lean	2	Septimal intervals shadowed by equal temperament.
Wolf Corridor	Pythagorean	Pythagorean	Drift	≤3	Both fields Pythagorean, drift behavior → accumulates wolf fifths.
Harmonic Gravity	Harmonic Series	Harmonic Series	Lean	≤4	Two voices lean toward the harmonic series from different directions.
Temperament Collapse	31‑TET	12‑TET	Lean	2	High‑resolution 31‑TET collapses into familiar 12‑TET.
Dual Root Conflict	Just	Pythagorean	Oscillate	2	Just vs. Pythagorean oscillating slowly (2.5 Hz, 15 cents depth).
Comma Drift	Just	Just	Drift	≤3	Pure just intonation drifting internally – comma accumulation only.

Custom: allows independent selection of Field A, Field B, Primary_behavior, N_voices, and all other parameters.

Analysis: zones, phrases, anchors, comma

Frame classification:
- Stable (S): voiced, harmonicity ≥ 0.60, intensity ≥ silence_floor (40 dB)
- Unstable (U): voiced, intensity ≥ silence_floor, but stability threshold not met
- Noisy (N): unvoiced or below silence floor

Phrase segmentation: contiguous non‑N frames separated by ≥0.18 s of N.

Anchor computation (cents): weighted mean of f0 in cents, using S frames (full weight) and U frames (half weight). If no voiced frames, first voiced frame in phrase is used. Memory weight (0.35) smooths anchors across phrases unless a large jump exceeds anchor_inertia_cents (60 cents).

Comma tracking: For each phrase transition, the interval (in cents) is compared to the nearest perfect fifth (701.96 cents) or fourth (498.04 cents). Deviations within ±120 cents are accumulated into phrase_comma. When cumulative comma exceeds comma_climax_threshold (38 cents), a wolf event is triggered – these are used structurally (e.g., emphasised in the designated wolf voice).

Field mix: blends Field A and Field B intervals per phrase. Depending on preset, mix may follow progress, oscillate, or be triggered by comma events.

Harmony generation (resampling engine)

For each phrase and voice:

Get base interval from Field A and Field B (getInterval procedure, see table below).
Blend intervals using phrase’s field mix → eff_int (cents above anchor).
If this voice is the designated wolf voice (voice 2 if ≥2 voices) and comma exceeds threshold, add a wolf offset: (comma / threshold) * 5.0 cents.
Compute target Hz: anchor_hz * 2^(eff_int/1200).
Apply pitch behavior (Snap, Lean, Drift, Oscillate, Smear, Refuse) to obtain final Hz. (Behaviors are applied in log space; see next section.)
Calculate playback speed factor = final_hz / anchor_hz.
Extract source segment of length phrase_dur * speed_factor (so that after resampling it lasts exactly phrase_dur).
Override sampling frequency by source_sr * speed_factor, then resample back to original SR – this pitch‑shifts the entire segment, preserving all spectral detail including noise.
Apply fades, time‑correct (PSOLA if requested), and place into a full‑length silence buffer at the computed start time (align‑start, centre, or stretch‑to‑fit).
Accumulate into voice buffer with appropriate gain.

Interval tables (cents)

System	Voice 1	Voice 2	Voice 3	Voice 4	Voice 5
12‑TET	700.00	400.00	300.00	500.00	200.00
Just Intonation	701.96	386.31	315.64	498.04	203.91
Pythagorean	701.96	407.82	294.13	498.04	203.91
19‑TET	694.74	378.95	315.79	505.26	189.47
31‑TET	696.77	387.10	309.68	503.23	193.55
Harmonic Series	701.96	386.31	968.83	203.91	1088.27
Septimal	968.83	266.87	582.51	435.08	231.17

Pitch behaviors (applied per phrase)

Behavior	Description	Formula
Snap	Exact target interval.	out = tgt
Lean	Blend source and target in log space (cents).	out = src + lean * (tgt - src)
Drift	One‑pole IIR approach to target over time.	out_{t} = out_{t-1} + α·(tgt - out_{t-1}), α = 1 - exp(-dt/τ)
Oscillate	Sinusoidal modulation around target.	out = tgt + depth·sin(phase)
Smear	Same as Snap (spatial smear applied at mixing stage).	out = tgt
Refuse	Target + syntonic comma (21.5 cents).	out = tgt + 21.5 cents

Lean is the most musically useful – it creates a “magnet” effect where the voice is attracted to the harmonic interval but retains some of the source’s original pitch character.

Parameters & defaults

Voices and levels

Parameter	Default	Description
N_voices	2	Number of companion voices (1–5).
Companion_level	0.55	Gain multiplier for all companion voices.
Original_level	0.80	Gain for original source (centre).

Placement & time

Parameter	Default	Description
Placement_mode	Align start	Align start, centre, or stretch‑to‑fit (duration correction).
Time_correction	None	None, Snap to phrase (if drift >20 ms), or Elastic (PSOLA).

Spatial

Parameter	Default	Description
Stereo_spread	0.8	0 = mono (all centre), 1 = full L‑R spread.

Advanced (fixed in script; can be edited)

Parameter	Default	Description
pitch_floor/ceiling	60 / 800 Hz	Analysis range
voicing_threshold	0.45	Voicing probability threshold
stability_threshold	0.60	Zone S threshold
silence_floor_dB	40 dB	Intensity silence floor
anchor_inertia_cents	60 cents	Max jump without memory reduction
drift_tau_sec	0.80 s	Drift time constant
oscillate_rate/depth	3.5 Hz / 10 cents	Oscillation parameters
comma_climax_threshold	38 cents	Wolf event threshold
fade_time_sec	0.015 s	Crossfade length
max/min_speed_factor	3.0 / 0.25	Resampling ratio limits

Visualization (Praat picture)

When Visualize = 1, the script draws a multi‑panel plot:

Zone map – colour‑coded by frame type: Stable (green), Unstable (orange), Noisy (grey). Vertical dotted lines show phrase boundaries.
Pitch curves – grey line = source f0 (where voiced). Coloured horizontal segments = phrase anchors (dotted). Coloured curves = target pitches for each companion voice (using same colours as in stereo spread).
Comma graph – accumulated comma in cents over time. Red horizontal lines mark ±comma_climax_threshold (wolf event boundaries).

Tip: The zone map gives immediate insight into which parts of the sound are stable enough to act as harmonic anchors. The comma graph reveals structural “tension” building toward wolf events.

FAQ / troubleshooting

Output is silent or very quiet

Check the analysis summary: if Zone S is very low (<5 %), the engine had few stable anchors. Companion voices may still be generated from Unstable zones, but at lower confidence. Increase Companion_level or reduce Original_level to hear them better.

Voices sound glitchy / have artefacts

The resampling engine (SR override + resample) is robust but very large speed factors (>2×) can introduce high‑frequency artefacts. Reduce max_speed_factor in the script if needed. Also ensure fade_time_sec is not too short for the material.

Companion voices are out of sync with original

Try Time_correction = Snap to phrase or Elastic. Align centre placement can also help synchronise attacks. For speech, Align start is usually best.

User‑Defined intervals

When Field A or B = User‑Defined, the script parses user_ratios_cents$ (default: “0, 150, 350, 702, 968”) – up to five comma‑separated cents values. These are assigned to voices 1–5 in order. Edit the string in the script to change them.

Wolf voice designation

If N_voices ≥ 2, voice 2 is the designated wolf voice. When cumulative comma exceeds comma_climax_threshold, this voice receives an extra offset (comma / threshold) * 5.0 cents, and its gain is multiplied by wolf_emphasis (1.4).

Modifying script parameters: All advanced parameters (analysis thresholds, drift tau, comma thresholds, etc.) are defined at the top of the Praat script. You can safely edit them to suit your material.