Fused TBQ3 dequant-FlashAttention kernel (MVP)

success
0.08
1/5
Overview Experiments 96 Forks 3 Resources 36 Benchmarks 2 Broadcasts 3 Related
Hypothesis

Fusing TBQ3 dequant (inverse SRHT) directly into a FlashAttention-style online softmax kernel eliminates all intermediate buffers (k_tmp, v_tmp, S) while producing identical results

Tags
Instances (1 reproduction)