Gemma 4's K=V shared projections interact differently with KV cache quantization due to correlated K/V errors
Gemma 4 uses K=V (shared projection, split before k_norm+RoPE vs v_norm). turbo3 K-only IMPROVES PPL (-1.7%) but V-only CATASTROPHICALLY degrades (+70%). q8_0 KLD on Gemma 4 is 0.509 — 110x worse than Qwen (0.005), meaning even 8-bit quantization severely distorts Gemma 4's output distribution. K=V means K and V quantization errors are correlated (both derived from same projection), preventing the independent noise cancellation that helps on standard architectures. No existing quantization method handles K=V. MoE sparsity (26B-A4B) compounds the issue via router misselection cascade. V quantization on Gemma 4 remains an open problem.