Using absolute query positions (nkv-nq+q_start+q_len) for causal skip correctly prunes Q batches without affecting PPL, saving 38-47% compute during full prefill
>