q_norm/k_norm RMSNorm tensors are tiny but sit in the attention path — sensitivity should be asymmetric to parameter count