Solutions to OpenAI-Gym environments using various machine learning algorithms.
Evolutionary Learning Strategy: Start with some initial weights and generate weights for each member in the population (by adding noise to the current weight). Track performance of each member's performance and update current weight (by a weighted sum). Learn more about Evolutionary Learning Strategies here.
REINFORCE: Monte Carlo Policy Gradient Perform gradient ascent after every episode on the weighted sum at time t, the probability of taking the particular action multiplied by the expected total discounted reward following the current policy. A discounted reward is used to enforce finishing early. Learn more about REINFORCE: Monte Carlo Policy Gradient by reading Reinforcement Learning: An Introduction (chapter 13.3) by Sutton & Barto here. (Book is unfinished and chapters order may change)
LunarLander-v2
CartPole-v0
ELS 1
CartPole-v1
ELS 1