Granular Attention Re-synthesis — User Guide

ReLU + Softmax applied to grain selection. At high temperature this becomes motif extraction / crystallized stutter. At low temperature it becomes a textural self-remix.

Author: Shai Cohen Version: 1.0 (2025) Technique: Attention-Based Grain Selection Category: Granular / Composition / Experimental Citation: Cohen, S. (2025). Praat AudioTools
Contents:

What this does

This script implements a Granular Attention Re-synthesis engine — a novel approach where grains compete for being chosen (selection probability), not for gain. The source re-synthesizes itself from its own most energetic (or most transient) moments using a ReLU + Softmax attention mechanism applied to grain selection.

🎯 What is Attention-Based Grain Selection?

Unlike traditional granular synthesis where grains are selected uniformly or randomly, this engine uses an attention mechanism:

  • Candidate grains are extracted from the source at regular intervals
  • Each grain is scored by RMS energy, transient slope, or a mixed measure
  • ReLU gate: grains below (mean score + floor_dB) are zeroed out
  • Softmax with temperature α: converts gated scores into a probability distribution
  • Sampling: grains are drawn from this distribution for each output hop
  • Recency penalty: recently chosen grains are less likely to be chosen again

The result: the source "attends" to its own salient moments, repeating, stuttering, or crystallizing them into new textures.

Key Features:

Technical Implementation: (1) Candidate Extraction: Extract N grains at candidate_hop intervals. (2) Scoring: Compute RMS, transient slope, or mixed score per grain. (3) ReLU + Softmax: Apply floor, then softmax with temperature. (4) Sampling: For each output hop, draw grain from distribution with recency penalty. (5) Grain Processing: Apply Hanning window, optional pitch jitter (resample), store. (6) Concatenation: All grains concatenated with overlap in one O(n) operation. (7) Wet/Dry Mix & Output.

Quick start

  1. In Praat, select exactly one Sound object (any duration, minimum 50 ms).
  2. Run script… → select Granular_Attention_Resynth.praat.
  3. Choose Preset (2-8 for specific strategies, 1 for custom).
  4. Set grain parameters (size, synthesis hop, candidate hop).
  5. Configure competition parameters (temperature, floor dB).
  6. Select score type and adjust variation parameters (jitter, recency penalty).
  7. Set wet/dry mix and enable visualization.
  8. Click OK — engine extracts candidates, computes attention, synthesizes output.
Quick tip: Start with Self-Remix preset on a 5-10 second recording with varied dynamics. Enable visualization — you'll see the grain scores (grey bars), probability curve (blue), and usage circles (orange). Listen to how the output is a textural remix of the source, with louder moments recurring more often. The output appears as "source_GAR_SelfRemix" in the Objects window.
Important: SYNTHESIS HOP must be ≤ grain_size/2 for stable overlap-add (script auto-clamps). CANDIDATE HOP determines grain density — smaller = more candidates, slower processing. TEMPERATURE α controls competition: 1-3 = gentle (all grains used), 5-10 = crystallized (dominant grains repeat), 15+ = extreme stutter (few grains dominate). RECENCY PENALTY (0-1) reduces repetition — higher = more variety. PITCH JITTER uses resampling, which may cause slight duration changes (grains are trimmed/padded to exact length).

Attention Selection Theory

The Attention Mechanism

🧮 ReLU + Softmax for Grain Selection

For each candidate grain i with raw score sᵢ:

meanScore = (1/N) Σ sᵢ floorValue = meanScore × 10^(floor_dB / 10) gatedᵢ = max(0, sᵢ - floorValue) softmax with temperature α: maxGated = max(gated) wᵢ = exp( (gatedᵢ - maxGated) / meanScore × α ) pᵢ = wᵢ / Σ wⱼ

Interpretation:

  • ReLU gate: Grains below floor are eliminated from competition
  • meanScore normalization: Makes temperature α independent of score scale
  • α → 0: uniform distribution (all active grains equally likely)
  • α → ∞: winner-take-most (highest-scoring grain dominates)

Score Types

📊 Three Scoring Methods

Score TypeFormulaMusical Effect
RMS (Energy)RMS² per grainLoud moments repeat — emphasizes dynamics, creates rhythmic loops
Transient (Slope)|RMS²[i] - RMS²[i-1]|Attacks/onsets repeat — creates stutter, glitch, percussive textures
Mixed(1-w)×RMS + w×TransientBlend of both — transient_weight controls balance

All scores are normalized to [0,1] before mixing.

Recency Penalty

During sampling, probabilities are modified based on the last three chosen grains: if candidate == lastGrain1: penalty factor = (1 - recency_penalty) if candidate == lastGrain2: penalty factor = (1 - recency_penalty × 0.6) if candidate == lastGrain3: penalty factor = (1 - recency_penalty × 0.3) Penalized probabilities are renormalized to sum to 1. Effect: Prevents the same grain from repeating too frequently, encouraging variety even in winner-take-all regimes.

Musical Effects by Temperature α

α RangeBehaviorMusical ResultExample Preset
1-3Gentle, near-uniformSelf-remix, all grains roughly equally usedSelf-Remix (α=3)
5-10Moderate competitionEnergetic moments dominate, texture crystallizesCrystallize (α=12)
15+Winner-take-mostFew grains repeat obsessively — motif extraction, stutter loopsMotifExtract (α=25)

Grain Processing Pipeline

For each output hop (position = hop × synthHop): 1. Draw grain index i from probability distribution (with recency penalty) 2. Extract grain from source at time grainStartTime[i] with Hanning window 3. Optional pitch jitter: resample by factor 2^(pitchShift/12), then back to original SR 4. Store grain in array After all hops: Concatenate all grains with overlap = grainDur - synthHop using Praat's built-in Concatenate with overlap (single O(n) operation)

Preset Strategies

Preset 2: Self-Remix (gentle, textural)

🌱 Gentle Textural Remix

Grain: 60 ms | Hop: 30 ms | Cand: 20 ms

α: 3.0 | Floor: -3 dB (open) | Score: RMS

Jitter: ±8 ms | Recency: 0.3

Character: Gentle competition — most grains used, textural self-remix

Use on: Ambient, pads, general purpose

Preset 3: Crystallize (high α, dense repeats)

💎 Crystallized Texture

Grain: 40 ms | Hop: 20 ms | Cand: 15 ms

α: 12.0 | Floor: +3 dB | Score: RMS

Jitter: ±3 ms | Recency: 0.4

Character: Energetic moments dominate, texture crystallizes around loud events

Use on: Rhythmic material, percussive sources

Preset 4: Motif Extract (very high α, stutter)

🔁 Stutter / Motif Extraction

Grain: 30 ms | Hop: 15 ms | Cand: 10 ms

α: 25.0 | Floor: +6 dB | Score: RMS

Jitter: ±1 ms | Recency: 0.6

Character: Winner-take-most — few grains repeat obsessively, stutter effect

Use on: Creating loops, stutter effects, motif extraction

Preset 5: Onset Harvest (transient score)

⚡ Attack-Focused

Grain: 50 ms | Hop: 25 ms | Cand: 15 ms

α: 8.0 | Floor: +2 dB | Score: Transient

Jitter: ±5 ms | Recency: 0.4

Character: Attacks and onsets repeat — percussive, glitchy textures

Use on: Drums, percussion, plosives

Preset 6: Shimmer (small grains, light jitter)

✨ Shimmering Texture

Grain: 20 ms | Hop: 10 ms | Cand: 10 ms

α: 2.0 | Floor: -6 dB | Score: RMS

Jitter: ±4 ms | Pitch: ±0.3 st | Recency: 0.2

Mix: 85%

Character: Small grains, light pitch jitter, gentle competition — shimmering, ethereal

Use on: Pads, sustained tones, ambient

Preset 7: Slabs (large grains, slow mosaic)

🧱 Large-Grain Mosaic

Grain: 300 ms | Hop: 150 ms | Cand: 50 ms

α: 10.0 | Floor: +2 dB | Score: RMS

Jitter: ±15 ms | Recency: 0.5

Character: Large slabs of sound, moderate competition — mosaic-like texture

Use on: Speech, instrumental phrases, longer gestures

Preset 8: Cloud (very large grains, drifting)

☁️ Drifting Cloud

Grain: 600 ms | Hop: 300 ms | Cand: 80 ms

α: 4.0 | Floor: -3 dB | Score: Mixed (w=0.4)

Jitter: ±30 ms | Recency: 0.3

Mix: 90%

Character: Very large grains, gentle competition, drifting layers

Use on: Drone, ambient, long-form textures

Parameters & Controls

Grain Parameters

ParameterDefaultDescription
Grain_size_ms50.0Duration of each grain (milliseconds)
Synthesis_hop_ms25.0Hop between output grain starts (auto-clamped to ≤ grain/2)
Candidate_hop_ms20.0Hop between candidate grain starts (density of pool)

Competition Parameters

ParameterDefaultDescription
Temperature8.0Softmax temperature α (1-3 = gentle, 5-10 = crystallized, 15+ = extreme)
Floor_dB0.0ReLU gate threshold relative to mean score (dB)

Score Type

ParameterDefaultDescription
Score_typeRMSRMS (energy), Transient (slope), or Mixed
Transient_weight0.5Weight of transient score in Mixed mode (0-1)

Variation Parameters

ParameterDefaultDescription
Time_jitter_ms5.0Random offset in grain start time (± ms)
Pitch_jitter_semitones0.0Random pitch shift per grain (± semitones)
Recency_penalty0.5Penalty for recently chosen grains (0-1, higher = less repetition)

Mix & Output

ParameterDefaultDescription
Wet_percent100.0Wet/dry mix (0 = dry, 100 = full wet)
Draw_visualization1Generate 6-panel analysis display
Play_result1Audition after processing

Visualization & Analysis

6-Panel Display

Granular Attention Re-synthesis Visualization: Panel 1: TITLE • Script name, preset, source name, α, floor, grain/hop, unique grain count Panel 2: INPUT WAVEFORM • X-axis: Time, Y-axis: Amplitude • Gray waveform • Title: "Original waveform" Panel 3: OUTPUT WAVEFORM • Same axes as input • Blue waveform = synthesized output • Title: "Output filename" Panel 4: GRAIN SCORE / PROBABILITY / USAGE • X-axis: Time (candidate grain positions) • Y-axis: Normalized score (0-1.3) • Grey bars = raw scores (height = score) • Orange dotted line = ReLU floor • Blue line = probability curve (scaled to same axis) • Orange circles = grain usage (size = frequency of selection) • Legend: score (grey), probability (blue), usage (orange) • Title: "Grain score / probability / usage (orange dot = chosen, size = frequency)" Panel 5: GRAIN SELECTION SEQUENCE • X-axis: Output hop (1 to N, limited to 200) • Y-axis: Grain number (1 to nCandGrains) • Color-coded cells: blue = low score, orange = high score • Shows which grains are chosen at each hop • Title: "Grain selection sequence (blue=low score orange=high score)" Panel 6: USAGE HISTOGRAM • X-axis: Grain number • Y-axis: Usage count • Color-coded bars: blue to orange gradient by score • Dashed line = uniform mean usage • Title: "Usage histogram (dashed = uniform mean)" Panel 7: STATS PANEL • Preset, α, floor, grain/hop sizes • Candidate count, active grains, unique used • Top grain info (index, time, usage count)

Reading the Visualization

What to look for:
  • Grey bars (scores): Shows the raw score of each candidate grain — tall bars = loud/transient moments
  • Orange dotted line (floor): Grains below this line are eliminated from competition
  • Blue line (probability): Follows score peaks but is sharpened by temperature α
  • Orange circles (usage): Size indicates how often each grain was chosen — should correlate with probability
  • Selection sequence: Scan horizontally to see patterns — repeating colors indicate grain stutter
  • Usage histogram: Compare to uniform mean (dashed) — tall bars = dominant grains

Applications

Generative Composition

Use case: Creating evolving textures from any source material

Technique: Self-Remix or Cloud presets on varied sources

Workflow:

Rhythmic & Glitch Effects

Use case: Creating stutter, glitch, or rhythmic loops

Technique: Motif Extract or Crystallize presets on percussive material

Settings:

Sound Design for Media

Use case: Creating evolving backgrounds, transitions, impacts

Technique: Slabs or Shimmer presets on appropriate sources

Applications:

Research & Education

Use case: Demonstrating attention mechanisms, granular synthesis, probability

Technique: Enable visualization, compare presets on simple test signals

Learning outcomes:

Practical Workflow Examples

🎬 Film Scene: Tension Buildup

Goal: Create 30-second tension cue from 5-second drone

Settings:

  • Source: 5-second low drone
  • Preset: Cloud (large grains, gentle competition)
  • Custom: grain=800 ms, hop=400 ms, α=3.0
  • Transient_weight=0.3 (some attack sensitivity)

Result: 30-second evolving drone with subtle grain repetition, creating tension

🎚️ Electronic Music: Stutter Loop

Goal: Create stuttering vocal effect

Settings:

  • Source: 3-second vocal phrase
  • Preset: Motif Extract
  • Custom: α=30 (extreme), recency_penalty=0.8 (variety)
  • Score: Mixed (transient_weight=0.7) — emphasize attacks

Result: Vocal stutter where consonants repeat obsessively

🎙️ Voice Processing: Textural Voice

Goal: Transform speech into abstract texture

Settings:

  • Source: 10-second spoken phrase
  • Preset: Self-Remix
  • Custom: α=2.0 (gentle), pitch_jitter=0.2 st (subtle shimmer)

Result: Speech becomes abstract, textural cloud while retaining intelligibility

Troubleshooting Common Issues

Problem: Output has clicks/pops
Cause: Hop too large relative to grain size, or grains not properly windowed
Solution: Ensure synthesis_hop ≤ grain_size/2 (script auto-clamps), Hanning window applied
Problem: Output much shorter/longer than source
Cause: Number of hops = floor(srcDur / synthHop) + 1; may not exactly match source
Solution: Output is trimmed to source duration; for longer output, increase source length
Problem: Only a few grains used (stutter extreme)
Cause: Temperature too high, floor too high, or recency penalty too low
Solution: Reduce α, lower floor, increase recency_penalty
Problem: Processing very slow
Cause: Many candidate grains (small candidate_hop) and many output hops
Solution: Increase candidate_hop, reduce synthesis_hop, or use shorter source
Problem: Pitch jitter causes duration mismatch
Cause: Resampling changes grain duration slightly
Solution: Script trims/pads grains back to exact grainDur; for large pitch shifts, consider disabling jitter

Advanced Techniques

Custom score functions:

Edit the scoring section to use other features (spectral centroid, pitch, zero-crossing rate) as attention drivers.

Time-varying temperature:

Modify script to make temperature change over time — start high for exploration, end low for crystallization.

Grain position jitter:

Time_jitter_ms shifts grain start time; useful for de-correlating repeats, creating thicker textures.

Multi-channel output:

Process each channel separately or convert mono output to stereo; for true multi-channel, modify to process each channel independently.