All current results are on Qwen3 architecture (Qwen3-0.6B). The recipe may interact differently with Llama (RMSNorm but no q_norm/k_norm), Mistral (sliding window), Phi (different layer scaling), Gemma (post-attention LN), DeepSeek (MLA / shared experts). At minimum need 1 each of Llama / Mistral / Gemma to claim generality
EXP-0014, scope expansion