Code for the Projects done as part of the EECE 5698 ST:Reinforcement Learning Course. The files included here are corresponding to three projects done during the course and have been labelled accordingly.
- RL Project 1: Implement Multi-Armed Bandit with given reward distributions using Epsilon Greedy and Gradient-Bandit policies. The effect of varying learning rates, optimistic intial values and the difference between the two policies is studied.
- RL Project 2 Policy & Value Iteration: Policy & Value Iteration algorithms are implemented and compared for an agent in a maze trying to find a path from start to goal state with negative reward obstacles. The effect of varying stochasticity is studied.
- RL Project 2 Q2: Policy Iteration & Value iteration algorithms are implemented and compared for a gene regulatory network problem with given system dynamics, action space and bernoulli noise. Average gene activation is used to evaluate performance.
- RL Project 3 Q1: Different TD algorithms (Q-Learning, SARSA & Actor-Critic) are implemented and compared for the maze problem specified in Project 2 Policy & Value Iteration above.
- RL Project 3 Q2: Different TD algorithms (Q-Learning, SARSA, SARSA-Lambda & Actor-Critic) are implemented and compared for the gene regulatory problem specified in RL Project 2 Q2 above. The effect of varying alpha is studied.