turbo4 (3-bit + QJL correction) beats q8_0 on head_dim=256
turbo4 is BEST quantization option on head_dim=256 (-0.32% vs q8_0). QJL 1-bit sign correction + L2 norm correction combo is key. Implementation: 3-bit turbo3 base quantization, then compute residual, normalize, apply QJL random projection (separate sign arrays), store sign bits. Dequant reconstructs base + applies sign-bit correction scaled by residual norm. CRITICAL BUG FIXED: Q pre-rotation guard must check TURBO4_0 type too (not just TURBO3_0).