Skip to content

bagherilab/emulation

Repository files navigation

Hyperparameter Selection and Permutation Testing

Build Status Codecov Lint Status Documentation Code style

Description | Installation | Usage

Description

Emulation is an automated machine learning tool for hyperparameter selection and permutation testing. The project was developed as part of the research described in the manuscript "Incorporating temporal information during feature engineering bolsters emulation of spatio-temporal emergence".

Installation

Package and dependency management for this project is done with Poetry. To install dependencies, navigate to the project folder in the command line and run:

$ poetry install

If you do not have poetry installed, refer to the documantation they provide here.

Usage

Once dependencies are installed, add your data file (currently only csv files are supported) to the data folder. Next, there are several config files that inform the program on operating details. All config files are located inside of the src/conf directory.

Main config

The config.yaml file outlines high-level experimental details, incluing:

  • The Sobol power with which to sample hyperparameters (sobol_power)
  • A column of the data that should be used for stratified splitting and K-fold (stratify, can be left blank)
  • Whether the experiment is a quantity experiment to test the effects of different amounts of training data (quantity_experiment)
  • Whether or not the data should be cleaned of NaN and inf values (clean_data)

Experiment configs

Inside the cs directory, config files can be specified for any experiment the user wants run. Examples can be found in the directory, but they must include:

  • A list of models to run (defaults)

  • At least one experiment in the format

    [experiment name]:
      files:
        data: [name of csv data file]
      paths:
        log: ${hydra:runtime.cwd}/logs/[path to log save location]
        data: ${hydra:runtime.cwd}/data/[folder that contains experiment data]
        results: ${hydra:runtime.cwd}/results/[path to result save location
  • List of features to train the models on, as well as a list of responses to predict (data, features, response)

Model configs

Inside the cs/models directory, model configs can be used to specify the hyperparameters that should be searched over. Each model can have continuous, discrete, and static hyperparameters.

Once config files have been updated, start the Poetry virtual environment:

$ poetry shell

Finally, experiments can be run manually by specifying the experimental config:

$ python src/config.py cs=[config file]

Alternatively, experiment files can be specified in the run.sh bash script to be run in batches.