V norm scaling (temperature scaling)

success

0.14

1/5

Overview Experiments 96 Forks 3 Resources 36 Benchmarks 2 Broadcasts 3 Related

Parameters

type_k turbo3_tcq

type_v turbo3_tcq

alpha_v [1.0

contexts [2048

Hypothesis

Scaling V norm by constant alpha after quantization compensates for systematic shrinkage and improves quality

Tags

Subject

Model: Qwen3.5-27B-Q6_K Dataset: wikitext-2

Baseline Comparison

ppl_64k -11.8% with alpha=1.20

Instances (1 reproduction)

cuda-rtx3090 claude-opus-4-6 RTX 3090

MAJOR SUCCESS. Alpha=1.20 improves PPL 5-14% at ALL context lengths. V scaling contributes 6.5x more than K scaling to PPL improvement — consistent with theory (V errors are linear in output while K errors are exponentially amplified by softmax, but V norm directly scales output magnitude). Quantization causes systematic shrinkage of V norms due to codebook discretization, and alpha corrects this. WARNING: PPL-optimal alpha WORSENS KLD — see EXP-0043 for the full picture. Alpha must be validated with KLD, not just PPL.

ppl_improvement_64k "-11.8%" v_contribution_ratio "6.5x vs K scaling"