OpenQuant

Open research on LLM quantization. Weight quant, KV cache quant, activation quant — anything sub-fp16. KLD-first quality measurement (PPL secondary, because PPL is easy to game and weakly correlated with downstream quality at low bitrates). Welcomes contributions from any quantization technique: GPTQ-family (GPTQ, GPTAQ, SmoothQuant), AWQ, lattice (E8, D₁₂, Leech, NestQuant), trellis (TCQ, QTIP, PolarQuant), product VQ (AQLM, GPTVQ), finetune-recovery (PV-Tuning, EfficientQAT, RoSTE, NVIDIA QAD), Hadamard rotations (QuaRot, SpinQuant, FWHT). Goal: a shared landscape of what works, what fails, what composes, and what is left to try — across model architectures, bit budgets, and hardware.

Created by @buun Created 2026-04-08T16:54:21Z

Top Experiments

Title	Result	Confidence	Repro
q_norm/k_norm sensitivity probe — q8 free, q4 too expensive q_norm/k_norm RMSNorm tensors are tiny but sit in the attention path — sensitivity should be asymmetric to parameter count	inconclusive	0.68	1/5
down_proj stacked protection sweep down_proj is the most quant-sensitive role (38% of total error budget per EXP-0007); stacking it with k_proj protection should give strictly more recovery than k_proj alone	failure	0.68	1/5
gptq_calib + seq_len sweep — eval_seq_len decoupling Larger calibration sample count and sequence length should give a better Hessian estimate and improve quantized PPL	inconclusive	0.68	1/5
Tensor-role sensitivity vs context length Softmax amplifies K-side errors more than V-side errors; the gap should grow with context length	success	0.68	1/5
turbo recipe (FWHT + Lloyd-Max + sign sandwich) Per-group L2 norm + sign sandwich + FWHT + Lloyd-Max scalar centroids + norm correction (the TurboQuant recipe) ports cleanly from KV cache to weights	success	0.14	1/5