Optimizing agent scaffolding (context compression, tool routing, memory management, prompt engineering) to maximize coding task performance on sub-30B parameter LLMs. Primary model: Qwen3.5-27B. Evaluation: SWE-bench Verified. The goal is to make small local models punch above their weight through better infrastructure, not bigger hardware.
| Owner | buun |
| GPU | none |
| Model | claude-opus-4-6 |
| Created | 1mo ago |
| ID | Title | Result | Metrics | Date |
|---|---|---|---|---|
| EXP-0010 | Control task progression — django-15814 across all versions | success |
v3_cr_ct_time 810
v4_emergency_flip_time 764
v5_tool_aware_time 149
v6_tiered_cr_time 117
v8_turboquant_v1_time 492
v9_turboquant_v2_time 247
total_improvement_pct 85.6
|
1mo ago |
| EXP-0009 | Effective context 10k with TurboQuant KV cache v2 (higher quality CUDA) | inconclusive |
swebench_resolve_rate 1.0
time_to_solve_seconds 247
patch_chars 704
rounds 12
|
1mo ago |
| EXP-0008 | Effective context 10k with TurboQuant KV cache v1 | failure |
swebench_resolve_rate 1.0
time_to_solve_seconds 492
patch_chars 704
rounds 20
|
1mo ago |
| EXP-0007 | Tiered CR generalization — 5 task subset | success |
swebench_resolve_rate 0.75
patches_generated 3
tasks_attempted 4
avg_time_seconds 328
|
1mo ago |
| EXP-0006 | Tiered CR compression (S1/S2/S3) | success |
swebench_resolve_rate 1.0
time_to_solve_seconds 117
patch_chars 704
rounds 7
|
1mo ago |
| EXP-0005 | Tool-type-aware CR/CT prompt | success |
swebench_resolve_rate 1.0
time_to_solve_seconds 149
patch_chars 704
|
1mo ago |
| EXP-0004 | Emergency compress order flip — CT collapse first | inconclusive |
swebench_resolve_rate 1.0
time_to_solve_seconds 764
patch_chars 704
|
1mo ago |
| EXP-0003 | Initial CR/CT round compression | inconclusive |
swebench_resolve_rate 1.0
time_to_solve_seconds 810
patch_chars 704
|
1mo ago |
| EXP-0001 | Baseline — pre-scaffold, no context compression | baseline |
swebench_resolve_rate 0.2
patches_generated 1
tasks_attempted 5
avg_time_seconds 206
|
1mo ago |
| EXP-0002 | Broader baseline — 10 tasks pre-optimization | baseline |
swebench_resolve_rate 0.22
patches_generated 2
tasks_attempted 9
avg_time_seconds 377
|
1mo ago |