UPDATE: Erdre has been forked to d2m, where we continue the development of a more general machine learning pipeline for tabular and time series data, expanding beyond the scope of erroneous data repair.
A machine learning pipeline enabling Responsible AI:
- Explainable AI, using SHAP, LIME or both.
- Uncertainty estimation, using Bayesian dropout for neural networks.
- Carbon emissions tracking and reporting, using CodeCarbon.
Erdre lets you easily create and evaluate machine learning models for tabular and time series data, with built-in data profiling and feature engineering.
Tested on:
- Linux
- macOS
- Windows with WSL 2
- Clone/download this repository.
- Place your datafiles (csv) in a folder with the name of your dataset (
DATASET
) insideassets/data/raw/
, so the path to the files isassets/data/raw/[DATASET]/
. - Update
params.yaml
with the name of your dataset (DATASET
), the target variable, and other configuration parameters. - Build Docker container:
docker build -t d2m -f Dockerfile .
- Run the container:
docker run -p 5000:5000 -it -v $(pwd)/assets:/usr/d2m/assets -v $(pwd)/.dvc:/usr/d2m/.dvc d2m
- Open the website at localhost:5000 to use the graphical user interface.
- Copy
params.yaml
from the host to the container (findCONTAINER_NAME
by runningdocker ps
):
docker cp params.yaml [CONTAINER_NAME]:/usr/d2m/params.yaml
- Inside the interactive session in the container, run:
docker exec [CONTAINER_NAME] dvc repro