Promoting last 8 layers to q8_0 improves PPL while maintaining most turbo3 compression
LA-2 turbo3 PPL 5.8140 (-0.40% vs q8_0), 97.7% decode speed. LA-2 turbo4 PPL 5.8077 (-0.51% vs q8_0), 96.7% decode speed. Both beat uniform turbo AND q8_0 in quality. Matches TheTom's findings. Superseded by EXP-0003 (LA-1) for quality, but LA-2 remains the recommended config for TheTom's Metal implementation.