  • Dynamic Programming : Policy and Value iteration algorithms are implemented and tested on two Gym environments.
  • Monte Carlo : Monte Carlo prediction and control for Blackjack.
  • 10 Armed Bandits: 10-armed bandit, testing different exploration approaches.
  • MDP, Bellman equations and DP: Chapter3&4 RLBook2018, MDP and Bellman equations, Dynamic Programming on custom gridworld environment.
  • MCTS, FA and Policy Gradients: Chapter8&9&13 RLBook2018, MCTS, Function Approximation and Policy Gradients, homework of rl-course-spring2023 @ Ferdowsi University of Mashhad.
  • DP on Frozenlake: Chapter4 RLBook2018, Dynamic Programming, policy and value iterations on FrozenLake environment, mini-project of rl-course-2023 @ Ferdowsi University of Mashhad.
  • Sample based methods: : Chapter5&6 RLBook2018, A comparison of Monte Carlo and Temporal Difference control methods (SARSA & Q-Learning).