Skip to content

Transforming continuous state space in RL problems into discrete form to enable the utilization of tabular algorithms while preserving generalizability

Notifications You must be signed in to change notification settings

MHamza-Y/State-Space-Discretization

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

51 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

State-Space-Discretization

This repository contains the code for experiments in the Master Thesis "State Space Discretization for Reinforcement Learning". The thesis aimed at creating models that transform continuous, high-dimensional state spaces in reinforcement learning into discrete value embeddings, enabling the use of classical (tabular) reinforcement learning algorithms while retaining deep neural network generalization abilities. These models result in tabular policies, which are more interpretable to humans than policies generated through deep reinforcement learning with deep neural networks.

The models use overshooting LSTMs to produce continuous embeddings of the input state history, then convert them to discrete embeddings using various techniques. The models are tested on classical reinforcement learning algorithms, online, offline, and model-based, for a highly complex and partially observable Industrial Benchmark environment. The best results from these techniques are compared with the best results from the online deep reinforcement learning algorithm PPO. The results from the comparison are shown below:

Algorithm Reward ± STDEV Train Episodes Eval Episodes
Deep Reinforcemet Learning PPO (online) -185.24 ± 0.88 10620 50
Using Discretization Models and Tabular Reinforcement learning Q-learning (online) -181.54 ± 0.85 7000 50
Q-learning (Model Based) -184.87 ± 2.15 7000 50 The live environment is replaced with model of the environment to predict next state. The rewards are calculated manually.
R-Min (offline) -188.69 ± 0.64 10000 50 Algorithm from the Paper

This readme contains following sections:

  • Setup
  • Perform Experiment
  • Evaluation
  • Visualization

Setup

The dependencies for the experiments can be installed using following command:

pip install -r requirements.txt

Perform Experiment

The experiment consist of following tasks:

  • Generate data for discretization model training
  • Train LSTM based discretization models
  • Use discretization models to train online and model based RL algorithms
  • Generate data for offline RL training
  • Train offline RL algorithms

A pipeline has been created to automate the entire experiment, with the data generation for the discretization model training task separate for consistency. This allows for reusing the same data instead of changing it each run. The experiments can be run by executing the following notebooks:

  • generate_ib_lstm_dataset.ipynb (Only once on pulling the repo)
  • pipeline.ipynb

Artifacts generated from each experiment run via the pipeline are saved in tmp/experiments/, which include discretization models, policies, etc.

Evaluation

To evaluate the policies generated by the experiments, execute following:

  • evaluation.ipynb

Visualization

The results used in the thesis can be generated and visualized by running the notebook:

  • generate_results.ipynb

About

Transforming continuous state space in RL problems into discrete form to enable the utilization of tabular algorithms while preserving generalizability

Topics

Resources

Stars

Watchers

Forks