State-Space-Discretization

This repository contains the code for experiments in the Master Thesis "State Space Discretization for Reinforcement Learning". The thesis aimed at creating models that transform continuous, high-dimensional state spaces in reinforcement learning into discrete value embeddings, enabling the use of classical (tabular) reinforcement learning algorithms while retaining deep neural network generalization abilities. These models result in tabular policies, which are more interpretable to humans than policies generated through deep reinforcement learning with deep neural networks.

The models use overshooting LSTMs to produce continuous embeddings of the input state history, then convert them to discrete embeddings using various techniques. The models are tested on classical reinforcement learning algorithms, online, offline, and model-based, for a highly complex and partially observable Industrial Benchmark environment. The best results from these techniques are compared with the best results from the online deep reinforcement learning algorithm PPO. The results from the comparison are shown below:

	Algorithm	Reward ± STDEV	Train Episodes	Eval Episodes
Deep Reinforcemet Learning	PPO (online)	-185.24 ± 0.88	10620	50
Using Discretization Models and Tabular Reinforcement learning	Q-learning (online)	-181.54 ± 0.85	7000	50
	Q-learning (Model Based)	-184.87 ± 2.15	7000	50	The live environment is replaced with model of the environment to predict next state. The rewards are calculated manually.
	R-Min (offline)	-188.69 ± 0.64	10000	50	Algorithm from the Paper

This readme contains following sections:

Setup
Perform Experiment
Evaluation
Visualization

Setup

The dependencies for the experiments can be installed using following command:

pip install -r requirements.txt

Perform Experiment

The experiment consist of following tasks:

Generate data for discretization model training
Train LSTM based discretization models
Use discretization models to train online and model based RL algorithms
Generate data for offline RL training
Train offline RL algorithms

A pipeline has been created to automate the entire experiment, with the data generation for the discretization model training task separate for consistency. This allows for reusing the same data instead of changing it each run. The experiments can be run by executing the following notebooks:

generate_ib_lstm_dataset.ipynb (Only once on pulling the repo)
pipeline.ipynb

Artifacts generated from each experiment run via the pipeline are saved in tmp/experiments/, which include discretization models, policies, etc.

Evaluation

To evaluate the policies generated by the experiments, execute following:

evaluation.ipynb

Visualization

The results used in the thesis can be generated and visualized by running the notebook:

generate_results.ipynb

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
base_rl		base_rl
benchmarks		benchmarks
duipi		duipi
dynamic_programming		dynamic_programming
envs		envs
experiments		experiments
offline_dataset		offline_dataset
ppo		ppo
q_learning		q_learning
rmin		rmin
state_quantization		state_quantization
tensorboard_utils		tensorboard_utils
.gitignore		.gitignore
README.md		README.md
duipi.ipynb		duipi.ipynb
evaluation.ipynb		evaluation.ipynb
generate_filtered_dataset.ipynb		generate_filtered_dataset.ipynb
generate_ib_lstm_dataset.ipynb		generate_ib_lstm_dataset.ipynb
generate_multi_model_discrete_dataset.ipynb		generate_multi_model_discrete_dataset.ipynb
generate_results.ipynb		generate_results.ipynb
industrial_benchmark_ppo_lstm_train.ipynb		industrial_benchmark_ppo_lstm_train.ipynb
pipeline.ipynb		pipeline.ipynb
requirements.txt		requirements.txt
train_DiscFinalH.ipynb		train_DiscFinalH.ipynb
train_DiscHC.ipynb		train_DiscHC.ipynb
train_LSTM-AE.ipynb		train_LSTM-AE.ipynb
train_cont_forecasting_model.ipynb		train_cont_forecasting_model.ipynb
train_mb_offline_q_learning.ipynb		train_mb_offline_q_learning.ipynb
train_policy_iteration.ipynb		train_policy_iteration.ipynb
train_q_learning.ipynb		train_q_learning.ipynb
train_r_min.ipynb		train_r_min.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

State-Space-Discretization

Setup

Perform Experiment

Evaluation

Visualization

About

Languages

MHamza-Y/State-Space-Discretization

Folders and files

Latest commit

History

Repository files navigation

State-Space-Discretization

Setup

Perform Experiment

Evaluation

Visualization

About

Topics

Resources

Stars

Watchers

Forks

Languages