Different quantization types for K vs V can improve quality/speed tradeoff
turbo3-K + turbo4-V (5.8212) beats turbo4-K + turbo3-V (5.8653) by 0.76% PPL. Values matter more on Qwen3.5-27B — contradicts "More Keys Less Values" paper (arXiv:2502.15075). All asymmetric turbo+q8 combos slightly worse than pure q8_0 because norm correction mismatch dilutes the turbo advantage. q8_0-K + turbo3-V is the fastest asymmetric config at 98.8% of q8_0 decode speed.