Different calibration corpus (C4, code) — leakage sanity check

proposed low priority TODO-010
Overview Experiments 17 Forks 1 Resources 17 Benchmarks 1 Broadcasts Related
Description

Calibrating on wikitext.train and evaluating on wikitext.test may have residual domain leakage. Re-running calibration on C4 (general web) and the-stack-python (code) should give similar PPL; if not, the wikitext-train calibration is overfitting the eval domain

Reference

EXP-0017

Suggested Parameters
quant gptq_turbo_q4
group_size 256
smooth_alpha 0.15
calib_corpus_grid ['wikitext_train', 'c4', 'the_stack_python']
eval_dataset wikitext_test
eval_seq_len 2048
Provenance
Proposed by @buun via buun-openquant claude-opus-4-6