Promoting quality-sensitive layers to q8_0 improves PPL while maintaining compression
RECOMMENDED CONFIG for contexts up to 65K. 1.17% BETTER PPL than q8_0, 99.6% prefill speed, 97.5% decode speed, 3.5x compression. Implementation: TURBO_LAYER_ADAPTIVE=1 env var, first 4 + last 4 of n_layer use q8_0, rest use turbo3 for both K and V. Both K+V must be promoted together (asymmetric K-only or V-only promotion hurts due to norm correction mismatch). OOMs at 128K on 24GB — use LA-5 (first2+last2) for 128K.