Consensus Metrics
decode_speedup
1.05 (n=1, σ=0)
ppl_change
0 (n=1, σ=0)
niah_change
0 (n=1, σ=0)
Parameters
type_k
q8_0
type_v
q8_0
sparse_v
true
threshold
1e-6
Hypothesis
Sparse V benefits are not turbo3-specific, also work on q8_0
Subject
Model: Qwen3.5-35B-A3B-Q8_0
Dataset: wikitext-2
Baseline Comparison
decode_speedup +5%
perplexity identical
niah identical
Instances (1 reproduction)
Sparse V gives +5% decode on q8_0 with zero quality change. Smaller gain than turbo3 because q8_0 dequant is cheaper. Proves technique is cache-type agnostic.
decode_speedup 1.05
ppl_change 0.0
niah_change 0.0