This repo holds all programming assignments completed for my Reinforcement Learning course (Fall 2022).
Note: Scaffolding code was given for some of these assignments. All of my work is located inside block comments labeled ##### MY WORK START #####
and ##### MY WORK END #####
.
Introducing Reinforcement Learning and policies --- rewards and effects of random, expected-better and expected-worse policies.
- Code:
ex0_Exploration_Policies/ex0_Exploration_Policies.ipynb
- Report:
ex0_Exploration_Policies/ex0_report.pdf
Exploring the effects of exploration, exploitation and action selection within the k-arm bandit environment --- epsilon-greedy policies, Q-value initialization, UCB action selection.
- Code:
ex1_Bandits_Explore_Exploit_UCB/ex1_Bandits_Explore_Exploit_UCB.ipynb
- Report:
ex1_Bandits_Explore_Exploit_UCB/ex1_report.pdf
Implementing Dynamic Programming policy iteration in a grid world environment --- value iteration, transition probabilities, policy evaluation + improvement.
- Code:
ex3_DP_Policy_Iteration/ex3_DP_Policy_Iteration.ipynb
- Report:
ex3_DP_Policy_Iteration/ex3_report.pdf
Implementing Monte Carlo policy iteration in Blackjack, four-rooms, and racetrack environments --- first-visit MC, exploring starts, MC policy iteration.
- Code:
ex4_Monte_Carlo_Control/ex4_Monte_Carlo_Control.ipynb
- Report:
ex4_Monte_Carlo_Control/ex4_report.pdf
Ex5 --- Q-Learning, SARSA, Expected SARSA and Bias/Variance in Temporal Differencing and Monte Carlo
Implementing Q-Learning, SARSA and expected SARSA policies in a windy grid world environment. Exploring the bias-variance trade-off between Temporal Differencing and Monte Carlo methods.
- Code:
ex5_SARSA+ExpSARSA+Qlearning/ex5_SARSA+ExpSARSA+Qlearning.ipynb
- Report:
ex5_SARSA+ExpSARSA+Qlearning/ex5_report.pdf
Implementing the Dyna-Q and Dyna-Q+ algorithms in an adaptive blocking maze environment.
Implementing semi-gradient SARSA learning with state aggregation techniques and linear function approximation methods.
- Code:
ex7_Gradient_StateAggregation_SARSA/ex7_Gradient_StateAggregation_SARSA.ipynb
- Report:
ex7_Gradient_StateAggregation_SARSA/ex7_report.pdf
Implementing DQNs using PyTorch for non-linear function approximation: epsilon schedules, replay buffers, optimization.