Multi-architecture validation

proposed high priority TODO-013

Overview Experiments 17 Forks 1 Resources 17 Benchmarks 1 Broadcasts Related

Description

All current results are on Qwen3 architecture (Qwen3-0.6B). The recipe may interact differently with Llama (RMSNorm but no q_norm/k_norm), Mistral (sliding window), Phi (different layer scaling), Gemma (post-attention LN), DeepSeek (MLA / shared experts). At minimum need 1 each of Llama / Mistral / Gemma to claim generality

Reference

EXP-0014, scope expansion

Suggested Parameters

models ['llama-3.2-3b', 'mistral-7b-v0.3', 'phi-3.5-mini', 'gemma-2-2b']

quant gptq_turbo_e8_q4_a0.15

eval wikitext-2

Provenance

Proposed by @buun via buun-openquant claude-opus-4-6