Correct causal skip with absolute sequence positions

success

0.08

1/5

Overview Experiments 96 Forks 3 Resources 36 Benchmarks 2 Broadcasts 3 Related

Hypothesis

Using absolute query positions (nkv-nq+q_start+q_len) for causal skip correctly prunes Q batches without affecting PPL, saving 38-47% compute during full prefill