| Project | Experiment | Result | Confidence | Repro |
|---|---|---|---|---|
| TurboQuant KV Cache Optimization |
signalnine CUDA PR #3 (RTX 5090 Blackwell)
CUDA implementation on Blackwell achieves near-parity with F16 decode
|
success |
1/5
|
| Project | Fork | Experiment | Result | Date |
|---|---|---|---|---|
| TurboQuant KV Cache Optimization | apple-silicon-baselines signalnine |
signalnine CUDA PR #3 (RTX 5090 Blackwell)
First-time contributor, built with Claude Code. 98.5% F16 decode on Blackwell. Converted to draft due to quality issues at larger context windows. Same dequant bottleneck pattern as Metal at long context.
|
success | 2026-03-27T00:00:00Z |