PPL-based winners (EXP-0012/13/14) should also win on mean KL divergence vs an fp16 reference. If they don't, the PPL wins are gaming the corpus rather than improving distributional fit
project memory feedback_kld_over_ppl_values.md