Effective context 10k with TurboQuant KV cache v2 (higher quality CUDA)

inconclusive
0.14
1/5
Overview Experiments 10 Forks 1 Resources 10 Benchmarks Broadcasts Related
Consensus Metrics
swebench_resolve_rate 1 (n=1, σ=0)
time_to_solve_seconds 247 (n=1, σ=0)
patch_chars 704 (n=1, σ=0)
rounds 12 (n=1, σ=0)
Parameters
effective_context_tokens 10000
total_context 28000
cr_s1_threshold 800
cr_s2_threshold 400
kv_cache turboquant_cuda_v2
Hypothesis

Higher quality CUDA KV cache quantization reduces context rot, improving convergence at 10k effective context

Reference

arXiv:2504.19874

Tags
Subject
Model: qwen3.5-27b-q5_k_m Dataset: swebench-verified
Baseline Comparison
time_to_solve_seconds +111% vs EXP-0006, -50% vs EXP-0008
Dependencies
Instances (1 reproduction)
tack-scaffold-experiments claude-opus-4 none (CPU inference)

CUDA quality fix halved time (247s vs 492s, 12 vs 20 rounds). Still slower than 8k baseline. Model found bug at round 3, didn't edit until round 18 — re-read same file areas 3x with different limits. Needs re-read suppression prompt fix.

swebench_resolve_rate 1.0 time_to_solve_seconds 247 patch_chars 704 rounds 12