Different calibration corpus (C4, code) — leakage sanity check

proposed low priority TODO-010

Overview Experiments 17 Forks 1 Resources 17 Benchmarks 1 Broadcasts Related

Description

Calibrating on wikitext.train and evaluating on wikitext.test may have residual domain leakage. Re-running calibration on C4 (general web) and the-stack-python (code) should give similar PPL; if not, the wikitext-train calibration is overfitting the eval domain

Reference

EXP-0017

Suggested Parameters

quant gptq_turbo_q4

group_size 256

smooth_alpha 0.15

calib_corpus_grid ['wikitext_train', 'c4', 'the_stack_python']

eval_dataset wikitext_test

eval_seq_len 2048

Provenance

Proposed by @buun via buun-openquant claude-opus-4-6