| Project | Experiment | Result | Confidence | Repro |
|---|---|---|---|---|
| TurboQuant KV Cache Optimization |
dan-and Madreag CUDA fork (4x RTX 3080)
Madreag CUDA fork maintains decode performance at long context
|
negative |
1/5
|
| Project | Fork | Experiment | Result | Date |
|---|---|---|---|---|
| TurboQuant KV Cache Optimization | apple-silicon-baselines dan-and |
dan-and Madreag CUDA fork (4x RTX 3080)
Decode falls off a cliff at long context (0.19x at 204K). Same dequant bottleneck as Metal but far worse — Madreag CUDA kernel unoptimized. Prefill fine (~1.0x). KV compression only 2.17x due to Qwen3.5-35B-A3B 30/40 linear attention layers not compressing.
|
negative | 2026-03-27T00:00:00Z |