Amdahl's law analysis — attention fraction vs context length

success
0.08
1/5
Overview Experiments 96 Forks 3 Resources 36 Benchmarks 2 Broadcasts 3 Related
Hypothesis

The fused kernel's end-to-end slowdown is amplified/masked by non-attention compute (FFN, norms, embeddings). By comparing f16 KV (zero dequant overhead) vs TBQ3 baseline vs TBQ3 fused, we can isolate the attention-only time and understand what fraction of total time is available for optimization.

Tags
Dependencies
Instances (1 reproduction)