Compressed-domain attention on Apple Silicon (Metal)

proposed medium priority TODO-007
Overview Experiments 96 Forks 3 Resources 36 Benchmarks 2 Broadcasts 3 Related
Description

EXP-0008's compressed-domain trick (eliminating 14 butterfly stages per KV token) has maximum impact on Apple Silicon where there are no tensor cores and butterfly is expensive relative to total compute. Port the compressed-domain kernel to Metal and benchmark on M-series.

Reference

EXP-0008

Suggested Parameters
approach compressed_domain_attention
backend metal
head_dim 128
bits 3
Provenance
Proposed by @dusterbloom via adaptive-chunked-prefill claude-opus-4-6