Learning intrinsic rewards with bilevel reinforcement learning

This repository holds code for a bilevel meta-gradient reinforcement learning variant of DQN: Intrinsic Reward Deep Q-Network (IRDQN).

Setup

Clone this repository (including submodules): git clone --recurse_submodules https://github.com/EricSchuMa/bilevel-rl.git.
Follow the intructions in sumo_rl/README.md for installing the SUMO traffic simulator.
Create a conda envrionment with python 3.8: conda create -n bilevel-rl python=3.8.
Activate the conda environment: conda activate bilevel-rl.
Add your local repository path to the python PATH variable: export PYTHONPATH="${PYTHONPATH}:{/path/to/bilevel-rl}.
Install the requirements with pip: pip install -r requirements.txt.

From the project root, run the following command to train a DQN or IRDQN agent:

python experiments/train.py --config-path experiments/configs/{config}

where {config} should be replaced by a config file. Available config files are experiments/configs/DQN.ini and experiments/configs/IRDQN.ini.

The training logs are saved to the folder mlruns. You can access the logs by running a MLflow server:

mlflow ui

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.idea		.idea
best_models		best_models
blrl		blrl
data/json		data/json
experiments		experiments
models		models
nets		nets
plots		plots
sumo-rl @ d76acaa		sumo-rl @ d76acaa
utils		utils
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
plot_arrivals_balanced_unbalanced.py		plot_arrivals_balanced_unbalanced.py
plot_boxplot.py		plot_boxplot.py
plot_combination_fairness.py		plot_combination_fairness.py
plot_fairness_balanced_unbalanced.py		plot_fairness_balanced_unbalanced.py
plot_injected_vs_arrivals_ql_d.py		plot_injected_vs_arrivals_ql_d.py
plot_ql_d_jfi.py		plot_ql_d_jfi.py
plot_tail_dist.py		plot_tail_dist.py
plot_veh_cnt_ql_d.py		plot_veh_cnt_ql_d.py
requirements.txt		requirements.txt
requirements_m1.txt		requirements_m1.txt