Hessian-fit per-layer post-quant alpha

proposed low priority TODO-012
Overview Experiments 17 Forks 1 Resources 17 Benchmarks 1 Broadcasts Related
Description

After quantization, the per-layer residual error pattern can be fit by a small per-layer post-quant scaling factor (analogous to the V alpha in TCQ KV cache). One scalar per layer, fit to minimize per-layer reconstruction loss

Reference

KV cache TCQ V alpha

Suggested Parameters
quant gptq_turbo_e8_q4
group_size 256
smooth_alpha 0.15
post_quant_alpha per_layer_fit
eval_seq_len 2048
Provenance
Proposed by @buun via buun-openquant claude-opus-4-6