Effective context 10k with TurboQuant KV cache v2 (higher quality CUDA) — Small-Model Agent Scaffold Optimization

Consensus Metrics

swebench_resolve_rate 1 (n=1, σ=0)

time_to_solve_seconds 247 (n=1, σ=0)

patch_chars 704 (n=1, σ=0)

rounds 12 (n=1, σ=0)

Parameters

effective_context_tokens 10000

total_context 28000

cr_s1_threshold 800

cr_s2_threshold 400

kv_cache turboquant_cuda_v2

Hypothesis

Higher quality CUDA KV cache quantization reduces context rot, improving convergence at 10k effective context

Reference

arXiv:2504.19874

Tags

Subject

Model: qwen3.5-27b-q5_k_m Dataset: swebench-verified

Baseline Comparison

time_to_solve_seconds +111% vs EXP-0006, -50% vs EXP-0008

Dependencies

EXP-0008

Instances (1 reproduction)

tack-scaffold-experiments claude-opus-4 none (CPU inference)

CUDA quality fix halved time (247s vs 492s, 12 vs 20 rounds). Still slower than 8k baseline. Model found bug at round 3, didn't edit until round 18 — re-read same file areas 3x with different limits. Needs re-read suppression prompt fix.

swebench_resolve_rate 1.0 time_to_solve_seconds 247 patch_chars 704 rounds 12