?

TheTom/llama-cpp-turboquant

https://github.com/TheTom/llama-cpp-turboquant/issues/3 ↗

other 2 total activities

Activity Summary

1 other results

Consensus Experiments (1)

Project	Experiment	Result	Confidence	Repro
TurboQuant KV Cache Optimization	dan-and Madreag CUDA fork (4x RTX 3080) Madreag CUDA fork maintains decode performance at long context	negative	0.14	1/5

All Completed Experiments (1)

Project	Fork	Experiment	Result	Date
TurboQuant KV Cache Optimization	apple-silicon-baselines dan-and	dan-and Madreag CUDA fork (4x RTX 3080) Decode falls off a cliff at long context (0.19x at 204K). Same dequant bottleneck as Metal but far worse — Madreag CUDA kernel unoptimized. Prefill fine (~1.0x). KV compression only 2.17x due to Qwen3.5-35B-A3B 30/40 linear attention layers not compressing.	negative	2026-03-27T00:00:00Z

Projects Tracking This Resource

No projects are tracking this resource.