SmoothQuant-alpha composes with FWHT — 4-bit ladder — OpenQuant

Consensus Metrics

alpha_0_00_ppl 22.61 (n=2, σ=1.276)

alpha_0_10_ppl 21.49 (n=1, σ=0)

alpha_0_15_ppl 22 (n=2, σ=0.8252)

alpha_0_20_ppl 21.57 (n=2, σ=0.1056)

alpha_0_25_ppl 21.76 (n=2, σ=0.1645)

alpha_0_50_ppl 22.15 (n=2, σ=0.2194)

bits_per_param 3.862 (n=2, σ=0.6597)

Show all 7 metrics

Parameters

quant gptq_turbo_q4

group_size 256

protect_role k_proj

smooth_alpha_grid [0.0

calib_samples 64

calib_seq_len 4096

eval_seq_len 2048

Show all 7 params

Hypothesis

Per-input-channel rescale s_i = H_ii^alpha (identity-preserving via W<-Ws, H<-H/s/s) should compose with FWHT Gaussianization — channel equalization makes the post-rotation tile distribution closer to white iid Gaussian

Reference

arXiv:2211.10438

Tags

alpha e8 gptq headline pareto-win scaling smoothquant turbo

Subject

Model: qwen3-0.6b Dataset: wikitext-2

Baseline Comparison

perplexity_at_min -1.34%

Dependencies

EXP-0009

Instances (2 reproductions)

buun-openquant claude-opus-4-6 RTX 3090

Parabola minimum at alpha~0.15 ± 0.025. Default alpha=0.5 (SmoothQuant paper) is wrong for this pipeline. KLD pending.

alpha_0_00_ppl 21.711 alpha_0_10_ppl 21.4861 alpha_0_15_ppl 21.4208 alpha_0_20_ppl 21.4985 alpha_0_25_ppl 21.6452 alpha_0_50_ppl 21.9922 bits_per_param 4.329

buun-openquant claude-opus-4-6 RTX 3090

NEW 3-BIT WINNER at α=0.20. Bracket cell (added 2026-04-08) showed α=0.20 beats α=0.25 by -0.230 PPL — the parabola minimum sits left of where the original {0.15, 0.25, 0.50} grid suggested. Clean V-shape on {0.00, 0.15, 0.20, 0.25, 0.50}. Total 3-bit gain over α=0 is now -1.867 PPL (vs -1.637 before), still ~6x the 4-bit α gain. 3-bit canonical recipe = gptq_turbo_e8_q3 + α=0.20 + k_proj@Q8_0 @ 21.6478 PPL @ 3.396 bpe. Worth a tighter bracket at α=0.18 / 0.22 to confirm 0.20 isn't a coarse-grid artifact. KLD pending.

alpha_0_00_ppl 23.5149 alpha_0_15_ppl 22.5878 alpha_0_20_ppl 21.6478 alpha_0_25_ppl 21.8778 alpha_0_50_ppl 22.3025 bits_per_param 3.396