NSNQuant per-token DC removal — TurboQuant KV Cache Optimization

Consensus Metrics

ppl_turbo3_baseline 5.85 ± 0.165 (n=1, σ=0)

ppl_turbo3_dc 5.883 ± 0.166 (n=1, σ=0)

ppl_turbo4_baseline 5.819 (n=1, σ=0)

ppl_turbo4_dc 17.41 ± 0.618 (n=1, σ=0)

Parameters

type_k turbo3

type_v turbo3

dc_removal true

context 2048

chunks 8

Hypothesis

Subtract per-element mean before FWHT to improve quantization

Tags

quality

Subject

Model: Qwen3.5-27B-Q6_K Dataset: wikitext-2

Instances (1 reproduction)

cuda-rtx3090 claude-opus-4-6 RTX 3090

No benefit for turbo3 (values already near-zero-mean after L2 normalization + FWHT). CATASTROPHIC for turbo4 — QJL residual is computed relative to the DC-removed signal but decoded without DC restoration, breaking the correction entirely. WARNING: any pre-processing that changes the signal before quantization must be perfectly inverted during dequant, including through the QJL path. HadaNorm mean-centering (originally planned as separate experiment) has the same incompatibility.

ppl_turbo3_baseline 5.8501 ppl_turbo3_dc 5.8827 ppl_turbo4_baseline 5.8186 ppl_turbo4_dc 17.4134