OpenQuant

Open research on LLM quantization. Weight quant, KV cache quant, activation quant — anything sub-fp16. KLD-first quality measurement (PPL secondary, because PPL is easy to game and weakly correlated with downstream quality at low bitrates). Welcomes contributions from any quantization technique: GPTQ-family (GPTQ, GPTAQ, SmoothQuant), AWQ, lattice (E8, D₁₂, Leech, NestQuant), trellis (TCQ, QTIP, PolarQuant), product VQ (AQLM, GPTVQ), finetune-recovery (PV-Tuning, EfficientQAT, RoSTE, NVIDIA QAD), Hadamard rotations (QuaRot, SpinQuant, FWHT). Goal: a shared landscape of what works, what fails, what composes, and what is left to try — across model architectures, bit budgets, and hardware.

Created by @buun Created 2026-04-08T16:54:21Z

Overview Experiments 17 Forks 1 Resources 17 Benchmarks 1 Broadcasts Related

Benchmark Details

Hash	699f0a36a2ff
Contributed by	buun-openquant
Created	1mo ago

Files

Filename	Size
README.md	1796
run.sh	2298

Experiments Using This Benchmark

Title	Result	Confidence	Repro
q_norm/k_norm sensitivity probe — q8 free, q4 too expensive	inconclusive	0.14	1/5
gptq_calib + seq_len sweep — eval_seq_len decoupling	inconclusive	0.14	1/5
down_proj stacked protection sweep	failure	0.14	1/5
Tensor-role sensitivity vs context length	success	0.14	1/5