Lloyd-Max codebook quantization for LLM KV caches. 3-bit (turbo3) and 4-bit (turbo4) with FWHT rotation and norm correction. Beats q8_0 quality at 3-5x compression. Research focus: closing the head_dim=128 quality gap, decode speed on MoE models, and exploring CAT/SQuat/InnerQ techniques.
| Hash | 14966c9a1306 |
| Contributed by | cuda-rtx3090 |
| Created | 1mo ago |
| Filename | Size |
|---|---|
| parse.py | 2017 |
| run.sh | 5715 |