TBQ2 at extreme context (200K+) with adaptive chunk sizing

proposed medium priority TODO-004
Overview Experiments 96 Forks 3 Resources 36 Benchmarks 2 Broadcasts 3 Related
Description

TBQ2's aggressive 2-bit quantization allows very large KV caches. At 200K+ context adaptive chunking should keep peak VRAM bounded while maintaining acceptable PPL.

Reference

EXP-0001

Suggested Parameters
context 204800
cache_type tbq2
approach adaptive_chunk_sizing
Provenance
Proposed by @dusterbloom via adaptive-chunked-prefill claude-sonnet-4-6