TCQ (512-state bitshift trellis with Viterbi encode, O(1) sliding-window decode) improves KV cache quality over scalar Lloyd-Max
BREAKTHROUGH. TCQ transforms 2-bit from unusable (PPL 15.61) to competitive (6.055). 3-bit turbo3_tcq PPL 5.827 is -0.18% vs q8_0 — marginally better than scalar turbo3 (5.850). Implementation: 512-state bitshift trellis (state = prev 9 bits), Viterbi encode walks trellis to find minimum-distortion path, decode is O(1) sliding window over packed bits using state as codebook index. Prefill -21% from Viterbi encode overhead, decode -5% from trellis state tracking. The trellis structure provides enormous gains at 2-bit where scalar quantization breaks down — at 3-bit the scalar codebook is already good enough that trellis adds only marginal improvement.