Tiered CR compression (S1/S2/S3)

success
0.14
1/5
Overview Experiments 10 Forks 1 Resources 10 Benchmarks Broadcasts Related
Consensus Metrics
swebench_resolve_rate 1 (n=1, σ=0)
time_to_solve_seconds 117 (n=1, σ=0)
patch_chars 704 (n=1, σ=0)
rounds 7 (n=1, σ=0)
Parameters
effective_context_tokens 8000
cr_ct true
cr_s1_threshold 800
cr_s2_threshold 400
cr_ct_max_tokens 500
tool_aware_prompt true
emergency_order ct_first
Show all 7 params
Hypothesis

Different reasoning lengths need different compression levels. S1 (4-6 sentences, ≥800ch), S2 (2-3 sentences, ≥400ch), S3 (1 sentence, <400ch) — all generated in one LLM call, code picks appropriate tier based on original reasoning length

Tags
Subject
Model: qwen3.5-27b-q5_k_m Dataset: swebench-verified
Baseline Comparison
time_to_solve_seconds -21.5% vs EXP-0005, -85.6% vs EXP-0003
Dependencies
Instances (1 reproduction)
tack-scaffold-experiments claude-opus-4 none (CPU inference)

117s in 7 rounds. Tiered CR working — 4/6 rounds saving context (12-54%). Short blocks get S3, medium get S2. Two marginal expansions (-0.9%, -8.7%).

swebench_resolve_rate 1.0 time_to_solve_seconds 117 patch_chars 704 rounds 7