Reproducing TableShift Benchmark Results

This document describes how to reproduce the results from our main paper, Benchmarking Distribution Shift in Tabular Data with TableShift. If you instead want to evaluate your own algorithm(s) on the TableShift benchmark, see the benchmarking_guide.md in this repo.

Note: Running the experiments for this paper consumed a significant amount of computational resources (thousands of GPU-hours on a large cluster, plus thousands of CPU-hours for CPU-only models). In order to reproduce our results in full, you will need access to similar computational resources.

Environment Setup

First, ensure your environment is set up properly.

Follow the installation instructions on the main README to set up your environment.

As a check of your setup, make sure that you're able to run the examples/run_expt.py script.

Dataset Access

There are 15 datasets in the TableShift benchmark. All of the datasets are publicly accessible (although a few require public credentialized access). Instructions for accessing these datasets are in the docs directory of this repo or on the TableShift website datasets page.

To fully reproduce our results, you will need access to all 15 of the bennchmark datasets. However, to begin with, we recommend starting with a public dataset (those marked as "Public" on the datasets page). If you decide to use a public credentialized dataset instead, follow the instructions on the datasets page to access the data, download any data file(s), and place them in the TableShift cache directory (by default, this will be tableshift/tmp).

Running The Experiments

There are 15 benchmarks and 19 learning algorithms. However, the procedure for these is identical.

Ensure you are in an environment with the TableShift dependencies installed. Then, run the following:

python scripts/ray_train.py \
    --models mlp \
    --experiment diabetes_readmission

You can substitute other models and dataset names for mlp and diabetes_readmission respectively.

The full list of benchmark dataset names (e.g. diabetes_readmission in the example above) is given in the main README. For more details, see the datasets page or our paper.

The full list of model names is given below. For more details on each algorithm, see our paper.

Model	Name in Tableshift
CatBoost	`catboost`
XGBoost	`xgb`
LightGBM	`lightgbm`
SAINT	`saint`
NODE	`node`
Group DRO	`group_dro`
MLP	`mlp`
Tabular ResNet	`resnet`
Adversarial Label DRO	`aldro`
CORAL	`deepcoral`
MMD	`mmd`
DRO	`dro`
DANN	`dann`
TabTransformer	`tabtransformer`
MixUp	`mixup`
Label Group DRO	`label_group_dro`
IRM	`irm`
VREX	`vrex`
FT-Transformer	`ft_transformer`

| test | test |

test

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

reproduction_guide.md

reproduction_guide.md

Reproducing TableShift Benchmark Results

Environment Setup

Dataset Access

Running The Experiments

Files

reproduction_guide.md

Latest commit

History

reproduction_guide.md

File metadata and controls

Reproducing TableShift Benchmark Results

Environment Setup

Dataset Access

Running The Experiments