Microtonal Harmonic Field Engine — v0.2 User Guide

Analysis‑driven microtonal harmonisation for existing audio. Detects stable (S), unstable (U), and noisy (N) zones from pitch, harmonicity, and intensity. Derives local harmonic anchors from the source itself, then generates companion voices via phrase‑granular resampling (playback‑rate pitch shifting). Commas and wolf intervals are tracked and used as compositional events — not corrected.

Author: Shai Cohen Affiliation: Department of Music, Bar‑Ilan University, Israel Version: 0.2 (2025) License: MIT License Repo: GitHub
Contents:

What this does

Microtonal Harmonic Field Engine creates microtonal companion voices from a monophonic source. It does not impose a fixed tuning system on the audio. Instead, it analyses the source’s own pitch, harmonicity, and intensity to segment it into phrases and classify frames into three zones:

For each phrase, a harmonic anchor (in cents) is computed from S and U frames. Two tuning fields (A and B) provide microtonal intervals relative to the anchor. A field mix per phrase blends between the two fields, creating a gradual harmonic drift. An accumulated comma tracks deviations from perfect fifths/fourths, triggering wolf events when it exceeds a threshold — these become structural markers.

Resampling engine: Companion voices are generated by extracting the source phrase, resampling it (playback‑rate pitch shift) to achieve the target interval, and placing it back with crossfades. This preserves the full spectral character of all zone types, including noise and breath. No pitch‑tracking is used on the shifted material — it sounds exactly like the source, transposed.

Quick start

  1. In Praat, select exactly one Sound object (mono or stereo).
  2. Run script…Microtonal_Harmonic_Field_Engine.praat.
  3. Choose a Preset_mode:
    • Just Bloom, Septimal Shadow, Wolf Corridor, Harmonic Gravity, Temperament Collapse, Dual Root Conflict, Comma Drift, Custom
  4. For Custom, select Field_A and Field_B tuning systems (12‑TET, Just, Pythagorean, 19‑TET, 31‑TET, Harmonic Series, Septimal, User‑Defined).
  5. Set N_voices (1–5), Companion_level, Original_level.
  6. Choose Primary_behavior (Snap, Lean, Drift, Oscillate, Smear, Refuse).
  7. Select Placement_mode (Align start, centre, or stretch‑to‑fit).
  8. Click OK. Praat runs pitch/harmonicity/intensity analysis, segments phrases, computes anchors, generates voices via resampling, mixes with stereo spread, and imports result as originalname_MHFE_result.
Quick tip: Start with Just Bloom (Field A = Just, Field B = Harmonic Series, behavior = Lean) for a warm, natural microtonal chorus. For dramatic wolf‑interval effects, try Wolf Corridor (Pythagorean both fields, behavior = Drift).
Important: This script uses only Praat’s built‑in analysis and resampling – no Python required. It works on mono or stereo; stereo sources are mixed to mono for analysis but the final output is stereo (original centre, voices panned according to Stereo_spread). Very short sounds (<0.5 s) may produce unreliable analysis.

The seven presets (+ Custom)

PresetField AField BBehaviorVoicesDescription
Just Bloom Just IntonationHarmonic SeriesLean≤4 Warm, natural chorus; gentle harmonic drift.
Septimal Shadow Septimal12‑TETLean2 Septimal intervals shadowed by equal temperament.
Wolf Corridor PythagoreanPythagoreanDrift≤3 Both fields Pythagorean, drift behavior → accumulates wolf fifths.
Harmonic Gravity Harmonic SeriesHarmonic SeriesLean≤4 Two voices lean toward the harmonic series from different directions.
Temperament Collapse 31‑TET12‑TETLean2 High‑resolution 31‑TET collapses into familiar 12‑TET.
Dual Root Conflict JustPythagoreanOscillate2 Just vs. Pythagorean oscillating slowly (2.5 Hz, 15 cents depth).
Comma Drift JustJustDrift≤3 Pure just intonation drifting internally – comma accumulation only.

Custom: allows independent selection of Field A, Field B, Primary_behavior, N_voices, and all other parameters.

Analysis: zones, phrases, anchors, comma

Frame classification:
- Stable (S): voiced, harmonicity ≥ 0.60, intensity ≥ silence_floor (40 dB)
- Unstable (U): voiced, intensity ≥ silence_floor, but stability threshold not met
- Noisy (N): unvoiced or below silence floor

Phrase segmentation: contiguous non‑N frames separated by ≥0.18 s of N.

Anchor computation (cents): weighted mean of f0 in cents, using S frames (full weight) and U frames (half weight). If no voiced frames, first voiced frame in phrase is used. Memory weight (0.35) smooths anchors across phrases unless a large jump exceeds anchor_inertia_cents (60 cents).

Comma tracking: For each phrase transition, the interval (in cents) is compared to the nearest perfect fifth (701.96 cents) or fourth (498.04 cents). Deviations within ±120 cents are accumulated into phrase_comma. When cumulative comma exceeds comma_climax_threshold (38 cents), a wolf event is triggered – these are used structurally (e.g., emphasised in the designated wolf voice).

Field mix: blends Field A and Field B intervals per phrase. Depending on preset, mix may follow progress, oscillate, or be triggered by comma events.

Harmony generation (resampling engine)

For each phrase and voice:

  1. Get base interval from Field A and Field B (getInterval procedure, see table below).
  2. Blend intervals using phrase’s field mix → eff_int (cents above anchor).
  3. If this voice is the designated wolf voice (voice 2 if ≥2 voices) and comma exceeds threshold, add a wolf offset: (comma / threshold) * 5.0 cents.
  4. Compute target Hz: anchor_hz * 2^(eff_int/1200).
  5. Apply pitch behavior (Snap, Lean, Drift, Oscillate, Smear, Refuse) to obtain final Hz. (Behaviors are applied in log space; see next section.)
  6. Calculate playback speed factor = final_hz / anchor_hz.
  7. Extract source segment of length phrase_dur * speed_factor (so that after resampling it lasts exactly phrase_dur).
  8. Override sampling frequency by source_sr * speed_factor, then resample back to original SR – this pitch‑shifts the entire segment, preserving all spectral detail including noise.
  9. Apply fades, time‑correct (PSOLA if requested), and place into a full‑length silence buffer at the computed start time (align‑start, centre, or stretch‑to‑fit).
  10. Accumulate into voice buffer with appropriate gain.

Interval tables (cents)

SystemVoice 1Voice 2Voice 3Voice 4Voice 5
12‑TET700.00400.00300.00500.00200.00
Just Intonation701.96386.31315.64498.04203.91
Pythagorean701.96407.82294.13498.04203.91
19‑TET694.74378.95315.79505.26189.47
31‑TET696.77387.10309.68503.23193.55
Harmonic Series701.96386.31968.83203.911088.27
Septimal968.83266.87582.51435.08231.17

Pitch behaviors (applied per phrase)

BehaviorDescriptionFormula
SnapExact target interval.out = tgt
LeanBlend source and target in log space (cents).out = src + lean * (tgt - src)
DriftOne‑pole IIR approach to target over time.out_{t} = out_{t-1} + α·(tgt - out_{t-1}), α = 1 - exp(-dt/τ)
OscillateSinusoidal modulation around target.out = tgt + depth·sin(phase)
SmearSame as Snap (spatial smear applied at mixing stage).out = tgt
RefuseTarget + syntonic comma (21.5 cents).out = tgt + 21.5 cents

Lean is the most musically useful – it creates a “magnet” effect where the voice is attracted to the harmonic interval but retains some of the source’s original pitch character.

Parameters & defaults

Voices and levels

ParameterDefaultDescription
N_voices2Number of companion voices (1–5).
Companion_level0.55Gain multiplier for all companion voices.
Original_level0.80Gain for original source (centre).

Placement & time

ParameterDefaultDescription
Placement_modeAlign startAlign start, centre, or stretch‑to‑fit (duration correction).
Time_correctionNoneNone, Snap to phrase (if drift >20 ms), or Elastic (PSOLA).

Spatial

ParameterDefaultDescription
Stereo_spread0.80 = mono (all centre), 1 = full L‑R spread.

Advanced (fixed in script; can be edited)

ParameterDefaultDescription
pitch_floor/ceiling60 / 800 HzAnalysis range
voicing_threshold0.45Voicing probability threshold
stability_threshold0.60Zone S threshold
silence_floor_dB40 dBIntensity silence floor
anchor_inertia_cents60 centsMax jump without memory reduction
drift_tau_sec0.80 sDrift time constant
oscillate_rate/depth3.5 Hz / 10 centsOscillation parameters
comma_climax_threshold38 centsWolf event threshold
fade_time_sec0.015 sCrossfade length
max/min_speed_factor3.0 / 0.25Resampling ratio limits

Visualization (Praat picture)

When Visualize = 1, the script draws a multi‑panel plot:

Tip: The zone map gives immediate insight into which parts of the sound are stable enough to act as harmonic anchors. The comma graph reveals structural “tension” building toward wolf events.

FAQ / troubleshooting

Output is silent or very quiet

Check the analysis summary: if Zone S is very low (<5 %), the engine had few stable anchors. Companion voices may still be generated from Unstable zones, but at lower confidence. Increase Companion_level or reduce Original_level to hear them better.

Voices sound glitchy / have artefacts

The resampling engine (SR override + resample) is robust but very large speed factors (>2×) can introduce high‑frequency artefacts. Reduce max_speed_factor in the script if needed. Also ensure fade_time_sec is not too short for the material.

Companion voices are out of sync with original

Try Time_correction = Snap to phrase or Elastic. Align centre placement can also help synchronise attacks. For speech, Align start is usually best.

User‑Defined intervals

When Field A or B = User‑Defined, the script parses user_ratios_cents$ (default: “0, 150, 350, 702, 968”) – up to five comma‑separated cents values. These are assigned to voices 1–5 in order. Edit the string in the script to change them.

Wolf voice designation

If N_voices ≥ 2, voice 2 is the designated wolf voice. When cumulative comma exceeds comma_climax_threshold, this voice receives an extra offset (comma / threshold) * 5.0 cents, and its gain is multiplied by wolf_emphasis (1.4).

Modifying script parameters: All advanced parameters (analysis thresholds, drift tau, comma thresholds, etc.) are defined at the top of the Praat script. You can safely edit them to suit your material.