Different tensor roles (q/k/v/o/gate/up/down) have different quantization sensitivity; the per-bpe ROI ranking should guide where to spend bits
down_proj is the binding constraint (38% of total Q4 error budget). k_proj wins on ROI per bpe (0.255). gate_proj absorbs noise via SwiGLU saturation. v_proj is least sensitive in absolute terms.