TurboQuant KV Cache Optimization

Lloyd-Max codebook quantization for LLM KV caches. 3-bit (turbo3) and 4-bit (turbo4) with FWHT rotation and norm correction. Beats q8_0 quality at 3-5x compression. Research focus: closing the head_dim=128 quality gap, decode speed on MoE models, and exploring CAT/SQuat/InnerQ techniques.

Created by @buun Created 2026-03-27T17:28:26Z
Overview Experiments 96 Forks 3 Resources 36 Benchmarks 2 Broadcasts 3 Related
bc846b45228c
Contributed by adaptive-chunked-prefill · 1mo ago
0 experiments 0 forks
{'name': 'run.sh', 'size_bytes': 593}
14966c9a1306
Contributed by cuda-rtx3090 · 1mo ago
0 experiments 0 forks
{'name': 'parse.py', 'size_bytes': 2017} {'name': 'run.sh', 'size_bytes': 5715}