Microtonal Harmonic Field Engine — v0.2 User Guide
Analysis‑driven microtonal harmonisation for existing audio. Detects stable (S), unstable (U), and noisy (N) zones from pitch, harmonicity, and intensity. Derives local harmonic anchors from the source itself, then generates companion voices via phrase‑granular resampling (playback‑rate pitch shifting). Commas and wolf intervals are tracked and used as compositional events — not corrected.
What this does
Microtonal Harmonic Field Engine creates microtonal companion voices from a monophonic source. It does not impose a fixed tuning system on the audio. Instead, it analyses the source’s own pitch, harmonicity, and intensity to segment it into phrases and classify frames into three zones:
- S (Stable) – reliable pitch, used as primary harmonic anchor.
- U (Unstable) – pitched but less stable, used with lower weight.
- N (Noisy/Unvoiced) – used as raw material, pitch‑shifted without anchor.
For each phrase, a harmonic anchor (in cents) is computed from S and U frames. Two tuning fields (A and B) provide microtonal intervals relative to the anchor. A field mix per phrase blends between the two fields, creating a gradual harmonic drift. An accumulated comma tracks deviations from perfect fifths/fourths, triggering wolf events when it exceeds a threshold — these become structural markers.
Quick start
- In Praat, select exactly one Sound object (mono or stereo).
- Run script… →
Microtonal_Harmonic_Field_Engine.praat. - Choose a Preset_mode:
- Just Bloom, Septimal Shadow, Wolf Corridor, Harmonic Gravity, Temperament Collapse, Dual Root Conflict, Comma Drift, Custom
- For Custom, select Field_A and Field_B tuning systems (12‑TET, Just, Pythagorean, 19‑TET, 31‑TET, Harmonic Series, Septimal, User‑Defined).
- Set N_voices (1–5), Companion_level, Original_level.
- Choose Primary_behavior (Snap, Lean, Drift, Oscillate, Smear, Refuse).
- Select Placement_mode (Align start, centre, or stretch‑to‑fit).
- Click OK. Praat runs pitch/harmonicity/intensity analysis, segments phrases,
computes anchors, generates voices via resampling, mixes with stereo spread, and imports result
as
originalname_MHFE_result.
The seven presets (+ Custom)
| Preset | Field A | Field B | Behavior | Voices | Description |
|---|---|---|---|---|---|
| Just Bloom | Just Intonation | Harmonic Series | Lean | ≤4 | Warm, natural chorus; gentle harmonic drift. |
| Septimal Shadow | Septimal | 12‑TET | Lean | 2 | Septimal intervals shadowed by equal temperament. |
| Wolf Corridor | Pythagorean | Pythagorean | Drift | ≤3 | Both fields Pythagorean, drift behavior → accumulates wolf fifths. |
| Harmonic Gravity | Harmonic Series | Harmonic Series | Lean | ≤4 | Two voices lean toward the harmonic series from different directions. |
| Temperament Collapse | 31‑TET | 12‑TET | Lean | 2 | High‑resolution 31‑TET collapses into familiar 12‑TET. |
| Dual Root Conflict | Just | Pythagorean | Oscillate | 2 | Just vs. Pythagorean oscillating slowly (2.5 Hz, 15 cents depth). |
| Comma Drift | Just | Just | Drift | ≤3 | Pure just intonation drifting internally – comma accumulation only. |
Custom: allows independent selection of Field A, Field B, Primary_behavior, N_voices, and all other parameters.
Analysis: zones, phrases, anchors, comma
- Stable (S): voiced, harmonicity ≥ 0.60, intensity ≥ silence_floor (40 dB)
- Unstable (U): voiced, intensity ≥ silence_floor, but stability threshold not met
- Noisy (N): unvoiced or below silence floor
Phrase segmentation: contiguous non‑N frames separated by ≥0.18 s of N.
Anchor computation (cents): weighted mean of f0 in cents, using S frames (full weight)
and U frames (half weight). If no voiced frames, first voiced frame in phrase is used.
Memory weight (0.35) smooths anchors across phrases unless a large jump exceeds anchor_inertia_cents (60 cents).
Comma tracking: For each phrase transition, the interval (in cents) is compared to the nearest perfect fifth (701.96 cents) or fourth (498.04 cents). Deviations within ±120 cents are accumulated into phrase_comma. When cumulative comma exceeds comma_climax_threshold (38 cents), a wolf event is triggered – these are used structurally (e.g., emphasised in the designated wolf voice).
Field mix: blends Field A and Field B intervals per phrase. Depending on preset, mix may follow progress, oscillate, or be triggered by comma events.
Harmony generation (resampling engine)
For each phrase and voice:
- Get base interval from Field A and Field B (
getIntervalprocedure, see table below). - Blend intervals using phrase’s field mix →
eff_int(cents above anchor). - If this voice is the designated wolf voice (voice 2 if ≥2 voices) and comma exceeds threshold,
add a wolf offset:
(comma / threshold) * 5.0cents. - Compute target Hz:
anchor_hz * 2^(eff_int/1200). - Apply pitch behavior (Snap, Lean, Drift, Oscillate, Smear, Refuse) to obtain final Hz. (Behaviors are applied in log space; see next section.)
- Calculate playback speed factor =
final_hz / anchor_hz. - Extract source segment of length
phrase_dur * speed_factor(so that after resampling it lasts exactlyphrase_dur). - Override sampling frequency by
source_sr * speed_factor, then resample back to original SR – this pitch‑shifts the entire segment, preserving all spectral detail including noise. - Apply fades, time‑correct (PSOLA if requested), and place into a full‑length silence buffer at the computed start time (align‑start, centre, or stretch‑to‑fit).
- Accumulate into voice buffer with appropriate gain.
Interval tables (cents)
| System | Voice 1 | Voice 2 | Voice 3 | Voice 4 | Voice 5 |
|---|---|---|---|---|---|
| 12‑TET | 700.00 | 400.00 | 300.00 | 500.00 | 200.00 |
| Just Intonation | 701.96 | 386.31 | 315.64 | 498.04 | 203.91 |
| Pythagorean | 701.96 | 407.82 | 294.13 | 498.04 | 203.91 |
| 19‑TET | 694.74 | 378.95 | 315.79 | 505.26 | 189.47 |
| 31‑TET | 696.77 | 387.10 | 309.68 | 503.23 | 193.55 |
| Harmonic Series | 701.96 | 386.31 | 968.83 | 203.91 | 1088.27 |
| Septimal | 968.83 | 266.87 | 582.51 | 435.08 | 231.17 |
Pitch behaviors (applied per phrase)
| Behavior | Description | Formula |
|---|---|---|
| Snap | Exact target interval. | out = tgt |
| Lean | Blend source and target in log space (cents). | out = src + lean * (tgt - src) |
| Drift | One‑pole IIR approach to target over time. | out_{t} = out_{t-1} + α·(tgt - out_{t-1}), α = 1 - exp(-dt/τ) |
| Oscillate | Sinusoidal modulation around target. | out = tgt + depth·sin(phase) |
| Smear | Same as Snap (spatial smear applied at mixing stage). | out = tgt |
| Refuse | Target + syntonic comma (21.5 cents). | out = tgt + 21.5 cents |
Lean is the most musically useful – it creates a “magnet” effect where the voice is attracted to the harmonic interval but retains some of the source’s original pitch character.
Parameters & defaults
Voices and levels
| Parameter | Default | Description |
|---|---|---|
| N_voices | 2 | Number of companion voices (1–5). |
| Companion_level | 0.55 | Gain multiplier for all companion voices. |
| Original_level | 0.80 | Gain for original source (centre). |
Placement & time
| Parameter | Default | Description |
|---|---|---|
| Placement_mode | Align start | Align start, centre, or stretch‑to‑fit (duration correction). |
| Time_correction | None | None, Snap to phrase (if drift >20 ms), or Elastic (PSOLA). |
Spatial
| Parameter | Default | Description |
|---|---|---|
| Stereo_spread | 0.8 | 0 = mono (all centre), 1 = full L‑R spread. |
Advanced (fixed in script; can be edited)
| Parameter | Default | Description |
|---|---|---|
| pitch_floor/ceiling | 60 / 800 Hz | Analysis range |
| voicing_threshold | 0.45 | Voicing probability threshold |
| stability_threshold | 0.60 | Zone S threshold |
| silence_floor_dB | 40 dB | Intensity silence floor |
| anchor_inertia_cents | 60 cents | Max jump without memory reduction |
| drift_tau_sec | 0.80 s | Drift time constant |
| oscillate_rate/depth | 3.5 Hz / 10 cents | Oscillation parameters |
| comma_climax_threshold | 38 cents | Wolf event threshold |
| fade_time_sec | 0.015 s | Crossfade length |
| max/min_speed_factor | 3.0 / 0.25 | Resampling ratio limits |
Visualization (Praat picture)
When Visualize = 1, the script draws a multi‑panel plot:
- Zone map – colour‑coded by frame type: Stable (green), Unstable (orange), Noisy (grey). Vertical dotted lines show phrase boundaries.
- Pitch curves – grey line = source f0 (where voiced). Coloured horizontal segments = phrase anchors (dotted). Coloured curves = target pitches for each companion voice (using same colours as in stereo spread).
- Comma graph – accumulated comma in cents over time.
Red horizontal lines mark ±
comma_climax_threshold(wolf event boundaries).
FAQ / troubleshooting
Check the analysis summary: if Zone S is very low (<5 %), the engine had few stable anchors. Companion voices may still be generated from Unstable zones, but at lower confidence. Increase Companion_level or reduce Original_level to hear them better.
The resampling engine (SR override + resample) is robust but very large speed factors (>2×) can introduce high‑frequency artefacts. Reduce max_speed_factor in the script if needed. Also ensure fade_time_sec is not too short for the material.
Try Time_correction = Snap to phrase or Elastic. Align centre placement can also help synchronise attacks. For speech, Align start is usually best.
When Field A or B = User‑Defined, the script parses user_ratios_cents$ (default:
“0, 150, 350, 702, 968”) – up to five comma‑separated cents values.
These are assigned to voices 1–5 in order. Edit the string in the script to change them.
If N_voices ≥ 2, voice 2 is the designated wolf voice.
When cumulative comma exceeds comma_climax_threshold, this voice receives an extra offset
(comma / threshold) * 5.0 cents, and its gain is multiplied by wolf_emphasis (1.4).
Modifying script parameters: All advanced parameters (analysis thresholds, drift tau, comma thresholds, etc.) are defined at the top of the Praat script. You can safely edit them to suit your material.