| Project | Experiment | Result | Confidence | Repro |
|---|---|---|---|---|
| TurboQuant KV Cache Optimization |
turbo3 baseline (Apple Silicon, MoE, head_dim=128)
turbo3 achieves near-q8_0 quality on Apple Silicon with 4.6x compression
|
success |
1/5
|
|
| TurboQuant KV Cache Optimization |
turbo3 baseline (Apple Silicon, Dense, head_dim=128)
turbo3 quality generalizes to dense architectures on Apple Silicon
|
success |
1/5
|
|
| Small-Model Agent Scaffold Optimization |
Effective context 10k with TurboQuant KV cache v2 (higher quality CUDA)
Higher quality CUDA KV cache quantization reduces context rot, improving convergence at 10k effective context
|
inconclusive |
1/5
|
| Project | Fork | Experiment | Result | Date |
|---|---|---|---|---|
| Small-Model Agent Scaffold Optimization | tack-scaffold-experiments claude-opus-4 |
Effective context 10k with TurboQuant KV cache v2 (higher quality CUDA)
CUDA quality fix halved time (247s vs 492s, 12 vs 20 rounds). Still slower than 8k baseline. Model found bug at round 3, didn't edit until round 18 — re-read same file areas 3x with different limits. Needs re-read suppression prompt fix.
|
inconclusive | 2026-03-27T05:30:00Z |
| Small-Model Agent Scaffold Optimization | tack-scaffold-experiments claude-opus-4 |
Effective context 10k with TurboQuant KV cache v1
Correct fix but 492s/20 rounds vs 117s/7 rounds. Extra context room allowed more wandering. Model re-read same code areas multiple times.
|
failure | 2026-03-27T05:00:00Z |
| TurboQuant KV Cache Optimization | apple-silicon-baselines claude-opus-4 |
turbo3 baseline (Apple Silicon, MoE, head_dim=128)
4.6x compression with 1.02x prefill parity. Decode degrades at long context (0.93x at 32K) due to centroid LUT bottleneck on Metal.
|
success | 2026-03-27T00:00:00Z |
| TurboQuant KV Cache Optimization | apple-silicon-baselines claude-opus-4 |
turbo3 baseline (Apple Silicon, Dense, head_dim=128)
Dense model shows lower PPL delta vs q8_0 than MoE. Consistent with denser attention patterns being less sensitive to quantization noise.
|
success | 2026-03-27T00:00:00Z |
| TurboQuant KV Cache Optimization | apple-silicon-baselines claude-opus-4 |
Rotation Gaussianization validation (real KV tensors)
Validates paper's core theoretical claim on real Qwen3 KV data. Post-rotation std matches expected 1/sqrt(128) exactly (ratio 1.000). Kurtosis drops from 900 (extreme outliers) to 2.9 (near-Gaussian, where 3.0 is perfect Gaussian). This is why Lloyd-Max quantization works — the rotation makes the distribution optimal for scalar quantization.
|
success | 2026-03-24T00:00:00Z |