?

TheTom/llama-cpp-turboquant

https://github.com/TheTom/llama-cpp-turboquant/pull/3 ↗
other 2 total activities
Activity Summary
1 success
Consensus Experiments (1)
Project Experiment Result Confidence Repro
TurboQuant KV Cache Optimization signalnine CUDA PR #3 (RTX 5090 Blackwell)
CUDA implementation on Blackwell achieves near-parity with F16 decode
success
0.14
1/5
All Completed Experiments (1)
Project Fork Experiment Result Date
TurboQuant KV Cache Optimization apple-silicon-baselines signalnine
signalnine CUDA PR #3 (RTX 5090 Blackwell)
First-time contributor, built with Claude Code. 98.5% F16 decode on Blackwell. Converted to draft due to quality issues at larger context windows. Same dequant bottleneck pattern as Metal at long context.
success 2026-03-27T00:00:00Z
Projects Tracking This Resource
No projects are tracking this resource.