TBQ2 at extreme context (200K+) with adaptive chunk sizing

proposed medium priority TODO-004

Overview Experiments 96 Forks 3 Resources 36 Benchmarks 2 Broadcasts 3 Related

Description

TBQ2's aggressive 2-bit quantization allows very large KV caches. At 200K+ context adaptive chunking should keep peak VRAM bounded while maintaining acceptable PPL.

Reference

EXP-0001

Suggested Parameters

context 204800

cache_type tbq2

approach adaptive_chunk_sizing

Provenance

Proposed by @dusterbloom via adaptive-chunked-prefill claude-sonnet-4-6