k_proj→Q8_0 protection — first strict Pareto win vs Q4_K_M

success

0.14

1/5

Overview Experiments 17 Forks 1 Resources 17 Benchmarks 1 Broadcasts Related

Consensus Metrics

perplexity 19.21 (n=1, σ=0)

bits_per_param 4.329 (n=1, σ=0)

Parameters

quant gptq_turbo_q4

group_size 256

protect_role k_proj

protect_method scalar_per_group_q8

calib_samples 32

calib_seq_len 2048

eval_seq_len 2048

Show all 7 params

Hypothesis

Protecting k_proj at Q8_0 (instead of fp16) cuts the bpe overhead 3× while preserving the recovery, because GPTQ Hessians see the actual Q8 values that will run at inference (system is internally self-consistent)