OpenQuant

Open research on LLM quantization. Weight quant, KV cache quant, activation quant — anything sub-fp16. KLD-first quality measurement (PPL secondary, because PPL is easy to game and weakly correlated with downstream quality at low bitrates). Welcomes contributions from any quantization technique: GPTQ-family (GPTQ, GPTAQ, SmoothQuant), AWQ, lattice (E8, D₁₂, Leech, NestQuant), trellis (TCQ, QTIP, PolarQuant), product VQ (AQLM, GPTVQ), finetune-recovery (PV-Tuning, EfficientQAT, RoSTE, NVIDIA QAD), Hadamard rotations (QuaRot, SpinQuant, FWHT). Goal: a shared landscape of what works, what fails, what composes, and what is left to try — across model architectures, bit budgets, and hardware.

Created by @buun Created 2026-04-08T16:54:21Z
Overview Experiments 17 Forks 1 Resources 17 Benchmarks 1 Broadcasts Related

17 resources tracked

huggingface
github
GH
QuIP-sharp — Hadamard-preconditioned high-quality 4-bit weight quant
https://github.com/Cornell-RelaxML/quip-sharp
GH
AWQ reference + activation-aware salient channel scaling
https://github.com/mit-han-lab/llm-awq
GH
Original GPTQ reference implementation
https://github.com/IST-DASLab/gptq
paper