OpenQuant

Open research on LLM quantization. Weight quant, KV cache quant, activation quant — anything sub-fp16. KLD-first quality measurement (PPL secondary, because PPL is easy to game and weakly correlated with downstream quality at low bitrates). Welcomes contributions from any quantization technique: GPTQ-family (GPTQ, GPTAQ, SmoothQuant), AWQ, lattice (E8, D₁₂, Leech, NestQuant), trellis (TCQ, QTIP, PolarQuant), product VQ (AQLM, GPTVQ), finetune-recovery (PV-Tuning, EfficientQAT, RoSTE, NVIDIA QAD), Hadamard rotations (QuaRot, SpinQuant, FWHT). Goal: a shared landscape of what works, what fails, what composes, and what is left to try — across model architectures, bit budgets, and hardware.

Created by @buun Created 2026-04-08T16:54:21Z
Overview Experiments 17 Forks 1 Resources 17 Benchmarks 1 Broadcasts Related
17
Experiments
8
Successes
2
Failures
0
Conflicts
1
Forks
Top Experiments
Title Result Confidence Repro Metrics
q_norm/k_norm sensitivity probe — q8 free, q4 too expensive
q_norm/k_norm RMSNorm tensors are tiny but sit in the attention path — sensitivity should be asymmetric to parameter count
inconclusive
0.68
1/5
down_proj stacked protection sweep
down_proj is the most quant-sensitive role (38% of total error budget per EXP-0007); stacking it with k_proj protection should give strictly more recovery than k_proj alone
failure
0.68
1/5
gptq_calib + seq_len sweep — eval_seq_len decoupling
Larger calibration sample count and sequence length should give a better Hessian estimate and improve quantized PPL
inconclusive
0.68
1/5
Tensor-role sensitivity vs context length
Softmax amplifies K-side errors more than V-side errors; the gap should grow with context length
success
0.68
1/5
turbo recipe (FWHT + Lloyd-Max + sign sandwich)
Per-group L2 norm + sign sandwich + FWHT + Lloyd-Max scalar centroids + norm correction (the TurboQuant recipe) ports cleanly from KV cache to weights
success
0.14
1/5