turbo3 with FWHT rotation + norm correction matches q8_0 quality
turbo3 BEATS q8_0 on head_dim=256. FWHT rotation is essential (without it: +6.8% worse). Implementation: SET_ROWS applies sign1 → FWHT → sign2 rotation before Lloyd-Max codebook quantization. Dequant applies inverse (sign2 → FWHT → sign1). Pre-rotate-queries approach means Q is forward-rotated once before attention loop instead of inverse-rotating every K.