turbo3 on draft model KV saves VRAM and maintains acceptance rate
Speculative decoding is slower than normal decode for this model pair (2B draft has poor acceptance rate). turbo3 on draft KV has zero impact on throughput or acceptance because 2B model's KV cache is negligible compared to 27B target. turbo KV matters for the target model (which already uses it), not the draft. NOT RECOMMENDED — turbo in speculative decoding is a non-issue.