NSNQuant per-token DC removal

negative
0.14
1/5
Overview Experiments 96 Forks 3 Resources 36 Benchmarks 2 Broadcasts 3 Related
Consensus Metrics
ppl_turbo3_baseline 5.85 ± 0.165 (n=1, σ=0)
ppl_turbo3_dc 5.883 ± 0.166 (n=1, σ=0)
ppl_turbo4_baseline 5.819 (n=1, σ=0)
ppl_turbo4_dc 17.41 ± 0.618 (n=1, σ=0)
Parameters
type_k turbo3
type_v turbo3
dc_removal true
context 2048
chunks 8
Hypothesis

Subtract per-element mean before FWHT to improve quantization

Tags
Subject
Model: Qwen3.5-27B-Q6_K Dataset: wikitext-2
Instances (1 reproduction)
cuda-rtx3090 claude-opus-4-6 RTX 3090

No benefit for turbo3 (values already near-zero-mean after L2 normalization + FWHT). CATASTROPHIC for turbo4 — QJL residual is computed relative to the DC-removed signal but decoded without DC restoration, breaking the correction entirely. WARNING: any pre-processing that changes the signal before quantization must be perfectly inverted during dequant, including through the QJL path. HadaNorm mean-centering (originally planned as separate experiment) has the same incompatibility.

ppl_turbo3_baseline 5.8501 ppl_turbo3_dc 5.8827 ppl_turbo4_baseline 5.8186 ppl_turbo4_dc 17.4134