turbo4 K Q pre-rotation bug fix

success
0.14
1/5
Overview Experiments 96 Forks 3 Resources 36 Benchmarks 2 Broadcasts 3 Related
Consensus Metrics
ppl_turbo4_kv_qwen35_27b_fixed 5.819 (n=1, σ=0)
ppl_turbo4_kv_qwen3_14b_fixed 6.912 (n=1, σ=0)
ppl_turbo4_kv_qwen3_14b_broken 3.264e+04 (n=1, σ=0)
ppl_turbo4v_qwen3_14b 6.623 (n=1, σ=0)
Parameters
context 2048
chunks 8
Hypothesis

turbo4 K produces garbage because Q pre-rotation guard only checks TURBO3_0, not TURBO4_0

Tags
Subject
Model: multiple Dataset: wikitext-2
Baseline Comparison
ppl_turbo4_kv_hd256 -0.32% ppl_turbo4_kv_hd128 +6.3%
Instances (1 reproduction)
cuda-rtx3090 claude-opus-4-6 RTX 3090

In fattn.cu, turbo_kv only checked GGML_TYPE_TURBO3_0, not TURBO4_0. This gated Q pre-rotation — turbo4 K stored rotated but Q never got pre-rotated, producing garbage dot products (PPL 33K). Fix changes guard to turbo_k_any = (TURBO3_0 || TURBO4_0). After fix: head_dim=256 turbo4 K+V BEATS q8_0 at -0.32%. head_dim=128 turbo4-V excellent (+1.9%), turbo4-K weaker (+5.1%). turbo4 K+V is the BEST quantization option on head_dim=256 models.

ppl_turbo4_kv_qwen35_27b_fixed 5.8186 ppl_turbo4_kv_qwen3_14b_fixed 6.9118 ppl_turbo4_kv_qwen3_14b_broken 32643 ppl_turbo4v_qwen3_14b 6.6232