turbo3 quality generalizes across architectures
KEY OPEN PROBLEM. head_dim=256 excellent (<0.3%). head_dim=128 degrades 2-4%. Root cause analysis: FWHT over 128 dims has fewer butterfly stages (7 vs 8) giving weaker mixing. Random sign arrays achieve concentration bound of O(1/sqrt(d)) — halving d doubles relative variance. The 8 Lloyd-Max centroids quantize the rotated distribution, but with d=128 the post-FWHT distribution has heavier tails. Potential fixes: CAT alignment (TODO-002/003), SQuat projection (TODO-001), asymmetric K/V bits (TODO-005). Gemma-3 had additional SWA bug (V un-rotation missing in iSWA path, fixed separately).