Per-role SmoothQuant-alpha sweep

proposed medium priority TODO-008
Overview Experiments 17 Forks 1 Resources 17 Benchmarks 1 Broadcasts Related
Description

Different tensor roles have different per-channel variance distributions (q/k/v see different upstream activations than gate/up/down). A single global α may be suboptimal — per-role α should improve every role independently

Reference

EXP-0012, EXP-0007

Suggested Parameters
quant gptq_turbo_q4
group_size 256
alpha_per_role {'q_proj': 'grid', 'k_proj': 'grid', 'v_proj': 'grid', 'o_proj': 'grid', 'gate_proj': 'grid', 'up_proj': 'grid', 'down_proj': 'grid'}
alpha_grid [0.0, 0.1, 0.15, 0.2, 0.25]
eval_seq_len 2048
Provenance
Proposed by @buun via buun-openquant claude-opus-4-6