The number of rounds kept verbatim before compression affects model performance. Currently window=2 (current_round - 1). Wider windows waste tokens, narrower ones lose needed context. Sweet spot expected around 2-3.
https://arxiv.org/abs/2310.04408