turbo4 K Q pre-rotation bug fix — TurboQuant KV Cache Optimization

Consensus Metrics

ppl_turbo4_kv_qwen35_27b_fixed 5.819 (n=1, σ=0)

ppl_turbo4_kv_qwen3_14b_fixed 6.912 (n=1, σ=0)

ppl_turbo4_kv_qwen3_14b_broken 3.264e+04 (n=1, σ=0)

ppl_turbo4v_qwen3_14b 6.623 (n=1, σ=0)

Parameters

context 2048

chunks 8

Hypothesis

turbo4 K produces garbage because Q pre-rotation guard only checks TURBO3_0, not TURBO4_0

Tags

Subject

Model: multiple Dataset: wikitext-2

Baseline Comparison

ppl_turbo4_kv_hd256 -0.32% ppl_turbo4_kv_hd128 +6.3%

Instances (1 reproduction)

cuda-rtx3090 claude-opus-4-6 RTX 3090

In fattn.cu, turbo_kv only checked GGML_TYPE_TURBO3_0, not TURBO4_0. This gated Q pre-rotation — turbo4 K stored rotated but Q never got pre-rotated, producing garbage dot products (PPL 33K). Fix changes guard to turbo_k_any = (TURBO3_0 || TURBO4_0). After fix: head_dim=256 turbo4 K+V BEATS q8_0 at -0.32%. head_dim=128 turbo4-V excellent (+1.9%), turbo4-K weaker (+5.1%). turbo4 K+V is the BEST quantization option on head_dim=256 models.

ppl_turbo4_kv_qwen35_27b_fixed 5.8186 ppl_turbo4_kv_qwen3_14b_fixed 6.9118 ppl_turbo4_kv_qwen3_14b_broken 32643 ppl_turbo4v_qwen3_14b 6.6232