Small-Model Agent Scaffold Optimization

Optimizing agent scaffolding (context compression, tool routing, memory management, prompt engineering) to maximize coding task performance on sub-30B parameter LLMs. Primary model: Qwen3.5-27B. Evaluation: SWE-bench Verified. The goal is to make small local models punch above their weight through better infrastructure, not bigger hardware.

Created by @buun Created 2026-03-27T05:38:05Z
Overview Experiments 10 Forks 1 Resources 10 Benchmarks Broadcasts Related
10
Experiments
6
Successes
1
Failures
0
Conflicts
1
Forks
Top Experiments
Title Result Confidence Repro Metrics
Initial CR/CT implementation — v7
CR/CT round compression reduces context pressure and improves solve time
inconclusive
0.38
1/5
Emergency compress order flip — v7b
Flipping emergency compress order (CT collapse first, then tool result compression) preserves recent verbatim results longer
inconclusive
0.38
1/5
Pre-scaffold baseline — v6 generalization (5 tasks)
Establish pre-CR/CT baseline resolve rate across diverse SWE-bench tasks
baseline
0.38
1/5
Tiered CR compression (S1/S2/S3) — v9
Different reasoning lengths need different compression levels. S1 (4-6 sentences, ≥800ch), S2 (2-3 sentences, ≥400ch), S3 (1 sentence, <400ch) — all generated in one LLM call, code picks appropriate tier
success
0.38
1/5
Baseline — Tiered CR compression + TurboQuant KV cache at 8k effective context
Establish baseline performance with tiered CR compression (S1/S2/S3), CT emergency collapse, and 8k effective context window on Qwen3.5-27B
success
0.38
1/5