gs=128 was a local minimum inherited from KV cache work; weights need a different group_size sweet spot
First Pareto tie vs Q4_K_M (19.54 vs 19.46, within stderr ±0.166) at 0.78 fewer bits. gs=128→gs=256 saves -1.53 PPL at fewer bpe.