Open research on LLM quantization. Weight quant, KV cache quant, activation quant — anything sub-fp16. KLD-first quality measurement (PPL secondary, because PPL is easy to game and weakly correlated with downstream quality at low bitrates). Welcomes contributions from any quantization technique: GPTQ-family (GPTQ, GPTAQ, SmoothQuant), AWQ, lattice (E8, D₁₂, Leech, NestQuant), trellis (TCQ, QTIP, PolarQuant), product VQ (AQLM, GPTVQ), finetune-recovery (PV-Tuning, EfficientQAT, RoSTE, NVIDIA QAD), Hadamard rotations (QuaRot, SpinQuant, FWHT). Goal: a shared landscape of what works, what fails, what composes, and what is left to try — across model architectures, bit budgets, and hardware.
| Hash | 699f0a36a2ff |
| Contributed by | buun-openquant |
| Created | 1mo ago |
| Filename | Size |
|---|---|
| README.md | 1796 |
| run.sh | 2298 |
| Title | Result | Confidence | Repro |
|---|---|---|---|
| q_norm/k_norm sensitivity probe — q8 free, q4 too expensive | inconclusive |
1/5
|
|
| gptq_calib + seq_len sweep — eval_seq_len decoupling | inconclusive |
1/5
|
|
| down_proj stacked protection sweep | failure |
1/5
|
|
| Tensor-role sensitivity vs context length | success |
1/5
|