Alpha applied at decode time (scaling dequantized V) vs encode time (baked into fp16 norm before quantization) may give different results, with decode-time enabling context-adaptive deployment
NUANCED RESULT. Encode-time alpha wins at 2K context — baking alpha into the fp16 norm before quantization means the codebook sees the correctly-scaled values and can quantize them optimally. Decode-time alpha wins at 8K+ (-3.9% KLD) because it enables context-adaptive correction (alpha varies as context grows, per EXP-0046). CRITICAL CAVEAT: for V specifically, alpha MUST be applied at encode time (baked into fp16 norm) — applying only at decode time causes 25% KLD regression because the norm stored in the quantized block header is wrong. The winning strategy is: apply alpha at encode time to set correct norm, then apply context-adaptive correction factor at decode time on top. This two-stage approach gets the best of both worlds.