Skip V dequant for negligible attention weights (exp(score-max) < 1e-6)
Eliminates context scaling regression on MoE models. Bit-identical PPL (zero quality loss). Implementation: in vec kernel V accumulation loop, compute exp(KQ_max_new - KQ_max_old) * score for each position. If < 1e-6, skip the V dequant+accumulate for that position entirely. At long context, 90%+ positions are skipped. Works on ALL quant types (not turbo-specific). Credit to TheTom's Metal implementation. Key insight: sparse V made the fp16 decode dequant path unnecessary — native dequant + sparse V matches fp16 dequant speed at all context lengths with zero extra memory.