Training on the Test Task

Code to reproduce the experiments, figures and tables of the paper Training on the Test Task Confounds Evaluation and Emergence.

The folder experiments/ contains the code to fine-tune models on the datasets of task-relevant data considered, and to evaluate models using the LM Evaluation Harness library.
The folder notebooks/evaluations contains the model evaluation files.
The Jupyter notebook notebooks/figures.ipynb reproduces the figures and tables in the paper.
The fine-tuned models are currently being uploaded here.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
experiments		experiments
notebooks		notebooks
README.md		README.md

Provide feedback