Reinforcement Learning

This repo holds all programming assignments completed for my Reinforcement Learning course (Fall 2022).

Note: Scaffolding code was given for some of these assignments. All of my work is located inside block comments labeled ##### MY WORK START ##### and ##### MY WORK END #####.

Assignment Descriptions

Ex0 --- Exploration Policies

Introducing Reinforcement Learning and policies --- rewards and effects of random, expected-better and expected-worse policies.

Code: ex0_Exploration_Policies/ex0_Exploration_Policies.ipynb
Report: ex0_Exploration_Policies/ex0_report.pdf

Ex1 --- Exploration, Exploitation and Action Selection

Exploring the effects of exploration, exploitation and action selection within the k-arm bandit environment --- epsilon-greedy policies, Q-value initialization, UCB action selection.

Code: ex1_Bandits_Explore_Exploit_UCB/ex1_Bandits_Explore_Exploit_UCB.ipynb
Report: ex1_Bandits_Explore_Exploit_UCB/ex1_report.pdf

Note: Ex2 was written only, so has been left out.

Ex3 --- Dynamic Programming + Policy Iteration

Implementing Dynamic Programming policy iteration in a grid world environment --- value iteration, transition probabilities, policy evaluation + improvement.

Code: ex3_DP_Policy_Iteration/ex3_DP_Policy_Iteration.ipynb
Report: ex3_DP_Policy_Iteration/ex3_report.pdf

Ex4 --- Monte Carlo Control

Implementing Monte Carlo policy iteration in Blackjack, four-rooms, and racetrack environments --- first-visit MC, exploring starts, MC policy iteration.

Code: ex4_Monte_Carlo_Control/ex4_Monte_Carlo_Control.ipynb
Report: ex4_Monte_Carlo_Control/ex4_report.pdf

Ex5 --- Q-Learning, SARSA, Expected SARSA and Bias/Variance in Temporal Differencing and Monte Carlo

Implementing Q-Learning, SARSA and expected SARSA policies in a windy grid world environment. Exploring the bias-variance trade-off between Temporal Differencing and Monte Carlo methods.

Code: ex5_SARSA+ExpSARSA+Qlearning/ex5_SARSA+ExpSARSA+Qlearning.ipynb
Report: ex5_SARSA+ExpSARSA+Qlearning/ex5_report.pdf

Ex6 --- Dyna-Q and Dyna-Q+

Implementing the Dyna-Q and Dyna-Q+ algorithms in an adaptive blocking maze environment.

Code: ex6_DynaQ_DynaQ+/ex6_DynaQ_DynaQ+.ipynb
Report: ex6_DynaQ_DynaQ+/ex6_report.pdf

Ex7 --- Semi-gradient SARSA, State Aggregation and Linear Function Approximation

Implementing semi-gradient SARSA learning with state aggregation techniques and linear function approximation methods.

Code: ex7_Gradient_StateAggregation_SARSA/ex7_Gradient_StateAggregation_SARSA.ipynb
Report: ex7_Gradient_StateAggregation_SARSA/ex7_report.pdf

Ex8 --- Deep Q-Learning Networks (DQNs)

Implementing DQNs using PyTorch for non-linear function approximation: epsilon schedules, replay buffers, optimization.

Code: ex8_Deep_Qlearning_Networks/ex8_Deep_Qlearning_Networks.ipynb
Report: ex8_Deep_Qlearning_Networks/ex8_report.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Reinforcement Learning

Assignment Descriptions

Ex0 --- Exploration Policies

Ex1 --- Exploration, Exploitation and Action Selection

Note: Ex2 was written only, so has been left out.

Ex3 --- Dynamic Programming + Policy Iteration

Ex4 --- Monte Carlo Control

Ex5 --- Q-Learning, SARSA, Expected SARSA and Bias/Variance in Temporal Differencing and Monte Carlo

Ex6 --- Dyna-Q and Dyna-Q+

Ex7 --- Semi-gradient SARSA, State Aggregation and Linear Function Approximation

Ex8 --- Deep Q-Learning Networks (DQNs)

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
ex0_Exploration_Policies		ex0_Exploration_Policies
ex1_Bandits_Explore_Exploit_UCB		ex1_Bandits_Explore_Exploit_UCB
ex3_DP_Policy_Iteration		ex3_DP_Policy_Iteration
ex4_Monte_Carlo_Control		ex4_Monte_Carlo_Control
ex5_SARSA+ExpSARSA+Qlearning		ex5_SARSA+ExpSARSA+Qlearning
ex6_DynaQ_DynaQ+		ex6_DynaQ_DynaQ+
ex7_Gradient_StateAggregation_SARSA		ex7_Gradient_StateAggregation_SARSA
ex8_Deep_Qlearning_Networks		ex8_Deep_Qlearning_Networks
.gitignore		.gitignore
README.md		README.md

luke-davidson/ReinforcementLearning

Folders and files

Latest commit

History

Repository files navigation

Reinforcement Learning

Assignment Descriptions

Ex0 --- Exploration Policies

Ex1 --- Exploration, Exploitation and Action Selection

Note: Ex2 was written only, so has been left out.

Ex3 --- Dynamic Programming + Policy Iteration

Ex4 --- Monte Carlo Control

Ex5 --- Q-Learning, SARSA, Expected SARSA and Bias/Variance in Temporal Differencing and Monte Carlo

Ex6 --- Dyna-Q and Dyna-Q+

Ex7 --- Semi-gradient SARSA, State Aggregation and Linear Function Approximation

Ex8 --- Deep Q-Learning Networks (DQNs)

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages