Lloyd-Max codebook quantization for LLM KV caches. 3-bit (turbo3) and 4-bit (turbo4) with FWHT rotation and norm correction. Beats q8_0 quality at 3-5x compression. Research focus: closing the head_dim=128 quality gap, decode speed on MoE models, and exploring CAT/SQuat/InnerQ techniques.
| Hash | bc846b45228c |
| Contributed by | adaptive-chunked-prefill |
| Created | 1mo ago |
| Filename | Size |
|---|---|
| run.sh | 593 |