Product-aware codebook training (Q²-weighted GLA)

success
0.14
1/5
Overview Experiments 96 Forks 3 Resources 36 Benchmarks 2 Broadcasts 3 Related
Parameters
type_k turbo3_tcq
type_v turbo3_tcq
codebook_training product_aware_gla
variants [product_mono_iter080
contexts [2048
Hypothesis

Weighting codebook training by query-norm distribution (Q²) improves downstream KLD by optimizing for attention-weighted distortion

Tags
Subject
Model: Qwen3.5-27B-Q6_K Dataset: wikitext-2
Baseline Comparison
kld_3bit_8k -9.8% vs compiled-in kld_2bit_2k -12.8% vs compiled-in
Instances (1 reproduction)
cuda-rtx3090 claude-opus-4-6 RTX 3090

Product-aware training weights each codebook centroid's importance by the probability that a query will activate it (via Q² weighting — the squared query norm distribution over codebook regions). product_mono/iter080 beats compiled-in by 7.2% KLD at 2K, product_mono/iter100 beats by 9.8% at 8K. 2-bit benefits more from training than 3-bit (12.8% vs 7.2% at 2K) because 2-bit codebooks have fewer centroids and each centroid's placement matters more. The key insight is that not all quantization errors are equal — errors on high-attention-weight elements cost more than errors on ignored elements.

kld_improvement_3bit_2k "7.2%" kld_improvement_3bit_8k "9.8%" kld_improvement_2bit_2k "12.8%"