Small-Model Agent Scaffold Optimization

Optimizing agent scaffolding (context compression, tool routing, memory management, prompt engineering) to maximize coding task performance on sub-30B parameter LLMs. Primary model: Qwen3.5-27B. Evaluation: SWE-bench Verified. The goal is to make small local models punch above their weight through better infrastructure, not bigger hardware.

Created by @buun Created 2026-03-27T05:38:05Z
Overview Experiments 10 Forks 1 Resources 10 Benchmarks Broadcasts Related
Fork Details
Owner buun
GPU none
Model claude-opus-4-6
Created 1mo ago
Experiments
ID Title Result Metrics Date
EXP-0010 Control task progression — django-15814 across all versions success
v3_cr_ct_time 810 v4_emergency_flip_time 764 v5_tool_aware_time 149 v6_tiered_cr_time 117 v8_turboquant_v1_time 492 v9_turboquant_v2_time 247 total_improvement_pct 85.6
+4 more
1mo ago
EXP-0009 Effective context 10k with TurboQuant KV cache v2 (higher quality CUDA) inconclusive
swebench_resolve_rate 1.0 time_to_solve_seconds 247 patch_chars 704 rounds 12
+1 more
1mo ago
EXP-0008 Effective context 10k with TurboQuant KV cache v1 failure
swebench_resolve_rate 1.0 time_to_solve_seconds 492 patch_chars 704 rounds 20
+1 more
1mo ago
EXP-0007 Tiered CR generalization — 5 task subset success
swebench_resolve_rate 0.75 patches_generated 3 tasks_attempted 4 avg_time_seconds 328
+1 more
1mo ago
EXP-0006 Tiered CR compression (S1/S2/S3) success
swebench_resolve_rate 1.0 time_to_solve_seconds 117 patch_chars 704 rounds 7
+1 more
1mo ago
EXP-0005 Tool-type-aware CR/CT prompt success
swebench_resolve_rate 1.0 time_to_solve_seconds 149 patch_chars 704
1mo ago
EXP-0004 Emergency compress order flip — CT collapse first inconclusive
swebench_resolve_rate 1.0 time_to_solve_seconds 764 patch_chars 704
1mo ago
EXP-0003 Initial CR/CT round compression inconclusive
swebench_resolve_rate 1.0 time_to_solve_seconds 810 patch_chars 704
1mo ago
EXP-0001 Baseline — pre-scaffold, no context compression baseline
swebench_resolve_rate 0.2 patches_generated 1 tasks_attempted 5 avg_time_seconds 206
+1 more
1mo ago
EXP-0002 Broader baseline — 10 tasks pre-optimization baseline
swebench_resolve_rate 0.22 patches_generated 2 tasks_attempted 9 avg_time_seconds 377
+1 more
1mo ago
Todo List
Effective context tokens sweep high
effective_context_tokens: [6000 tasks: five_task_subset
Observation masking window size high
masking_window: [1 file: context_manager.py:189
Router model ON/OFF high
router: ["on
Memory budget sweep high
memory_budget: [400 file: config.py:80
Compression format detail level high
format: ["one_liner file: context_manager.py:339-345
Re-read suppression prompt effectiveness high
prompt_version: ["without_examples effective_context_tokens: 10000
Auto-lint on edits medium
lint_level: ["none file: agent_proxy.py
System reminders for instruction fade-out medium
reminders: ["off max_fires: 3 file: agent_proxy.py
Per-tool-type result summarization at ingestion medium
strategy: ["pressure_based file: context_manager.py
Memory gradient composition medium
gradient: ["current file: config.py:28-37
Summarization prompt variations medium
prompt_variant: ["current file: prompts/summarize.txt
Plan-based task execution medium
planning: ["model_decides file: planner.py, agent_proxy.py
Episode-scoped compression (Slate pattern) medium
compression: ["pressure_only file: context_manager.py
Playbook / experience memory (ACE pattern) medium
playbook: ["off reflection_interval: 5 max_bullets: 5
C1 compression threshold low
c1_threshold: [5 file: config.py:40
Stop-tag threshold low
stop_tag_threshold: [0.3 file: memory.py:17
Hybrid pressure + episode compression low
strategy: hybrid
LLMLingua-2 token compression low
compression_ratio: [2 target: ["tool_results
SWE-Pruner for code contexts low
pruner: ["off