Tensor-role sensitivity vs context length

success
0.68
1/5
Overview Experiments 17 Forks 1 Resources 17 Benchmarks 1 Broadcasts Related
Consensus Metrics
k_proj_roi_2k 0.259 (n=1, σ=0)
k_proj_roi_16k 0.468 (n=1, σ=0)
kv_ratio_2k 1.85 (n=1, σ=0)
kv_ratio_16k 2.49 (n=1, σ=0)
Parameters
quant gptq_turbo_q4
group_size 256
protect_role each-of-7
protect_method fp16
eval_seq_len_grid [2048
Hypothesis

Softmax amplifies K-side errors more than V-side errors; the gap should grow with context length

Tags
Subject
Model: qwen3-0.6b Dataset: wikitext-2
Dependencies
Instances (0 reproductions)
No instances recorded.