Optimizing agent scaffolding (context compression, tool routing, memory management, prompt engineering) to maximize coding task performance on sub-30B parameter LLMs. Primary model: Qwen3.5-27B. Evaluation: SWE-bench Verified. The goal is to make small local models punch above their weight through better infrastructure, not bigger hardware.
Showing 16 experiments
| ID | Title / Hypothesis | Result | Confidence | Reproductions | Metrics |
|---|---|---|---|---|---|
| cexp_12676d |
CR/CT round compression reduces context pressure and improves solve time
|
inconclusive |
1/5
|
swebench_resolve_ratetime_to_solve_secondspatch_chars
|
|
| cexp_1801f3 |
Flipping emergency compress order (CT collapse first, then tool result compression) preserves recent verbatim results longer
|
inconclusive |
1/5
|
swebench_resolve_ratetime_to_solve_secondspatch_chars
|
|
| cexp_1b69c4 |
Establish pre-CR/CT baseline resolve rate across diverse SWE-bench tasks
|
baseline |
1/5
|
swebench_resolve_ratepatches_generatedtasks_attemptedavg_time_seconds
|
|
| cexp_1f7f95 |
Different reasoning lengths need different compression levels. S1 (4-6 sentences, ≥800ch), S2 (2-3 sentences, ≥400ch), S3 (1 sentence, <400ch) — all generated in one LLM call, code picks appropriate tier
|
success |
1/5
|
swebench_resolve_ratetime_to_solve_secondspatch_chars
|
|
| cexp_39fcd1 |
Establish baseline performance with tiered CR compression (S1/S2/S3), CT emergency collapse, and 8k effective context window on Qwen3.5-27B
|
success |
1/5
|
swebench_resolve_ratepatches_generatedtasks_attemptedavg_time_seconds
|
|
| cexp_a3eb8e |
Increasing effective context to 10k with TurboQuant KV cache improves performance by giving the model more working memory
|
inconclusive |
1/5
|
swebench_resolve_ratetime_to_solve_secondspatch_chars
|
|
| cexp_a45c53 |
Tool-type-aware compression prompts preserve more useful information (verbatim code for reads, line ranges for edits, key output for commands)
|
success |
1/5
|
swebench_resolve_ratetime_to_solve_secondspatch_chars
|
|
| cexp_140dcd |
Broader baseline across Django and other frameworks
|
baseline |
1/5
|
swebench_resolve_ratepatches_generatedtasks_attemptedavg_time_seconds
|
|
| cexp_2a9b92 |
Different reasoning lengths need different compression levels. S1 (4-6 sentences, ≥800ch), S2 (2-3 sentences, ≥400ch), S3 (1 sentence, <400ch) — all generated in one LLM call, code picks appropriate tier based on original reasoning length
|
success |
1/5
|
swebench_resolve_ratetime_to_solve_secondspatch_charsrounds
|
|
| cexp_2c4afe |
Tiered CR compression generalizes beyond single-task testing
|
success |
1/5
|
swebench_resolve_ratepatches_generatedtasks_attemptedavg_time_seconds
|
|
| cexp_5e332e |
Flipping emergency compress order (CT collapse first, then tool result compression) preserves recent verbatim tool results longer, improving model decisions
|
inconclusive |
1/5
|
swebench_resolve_ratetime_to_solve_secondspatch_chars
|
|
| cexp_8262c3 |
Higher quality CUDA KV cache quantization reduces context rot, improving convergence at 10k effective context
|
inconclusive |
1/5
|
swebench_resolve_ratetime_to_solve_secondspatch_charsrounds
|
|
| cexp_87f3b0 |
Establish pre-optimization baseline resolve rate across diverse SWE-bench tasks with no context compression active
|
baseline |
1/5
|
swebench_resolve_ratepatches_generatedtasks_attemptedavg_time_seconds
|
|
| cexp_a40f00 |
CR/CT round compression (reasoning summary + tool breadcrumb per round) reduces context pressure and improves solve time
|
inconclusive |
1/5
|
swebench_resolve_ratetime_to_solve_secondspatch_chars
|
|
| cexp_d58d3d |
Increasing effective context to 10k with TurboQuant KV cache quantization allows more working memory without quality loss
|
failure |
1/5
|
swebench_resolve_ratetime_to_solve_secondspatch_charsrounds
|
|
| cexp_d737bf |
Tracking iteration improvements on a single control task shows scaffold optimization impact
|
success |
1/5
|
v3_cr_ct_timev4_emergency_flip_timev5_tool_aware_timev6_tiered_cr_timev8_turboquant_v1_timev9_turboquant_v2_timetotal_improvement_pct
|