AWQ-style top-k salient channel scaling on top of SmoothQuant

proposed medium priority TODO-007
Overview Experiments 17 Forks 1 Resources 17 Benchmarks 1 Broadcasts Related
Description

AWQ identifies the top-k% salient channels by activation magnitude and protects them with per-channel scaling. SmoothQuant equalizes ALL channels by H_ii^α. The two are complementary — SmoothQuant for the bulk, AWQ-style top-k for the high-impact tail

Reference

arXiv:2306.00978 (AWQ), EXP-0012

Suggested Parameters
quant gptq_turbo_q4
group_size 256
smooth_alpha 0.15
awq_top_k_pct [0.5, 1.0, 2.0]
awq_scale 2.0
eval_seq_len 2048
Provenance
Proposed by @buun via buun-openquant claude-opus-4-6