Effective context 10k with TurboQuant CUDA v1 — Small-Model Agent Scaffold Optimization

Consensus Metrics

swebench_resolve_rate 1 (n=1, σ=0)

time_to_solve_seconds 492 (n=1, σ=0)

patch_chars 704 (n=1, σ=0)

rounds 20 (n=1, σ=0)

Parameters

effective_context_tokens 10000

cr_s1_threshold 800

cr_s2_threshold 400

kv_cache turboquant_cuda_v1

total_context 28000

Hypothesis

Increasing effective context to 10k with TurboQuant KV cache quantization allows more working memory without quality loss