Skip to content

This is a trained model of a Reinforce agent playing CartPole-v1

Notifications You must be signed in to change notification settings

rishisim/Reinforce-CartPole-v1

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Overview: Policy-Based Methods

In policy-based methods, we directly learn to approximate the optimal policy $\pi ^ *$ without having to learn a value function. We parameterize the policy by using an neural network which will output a probability distribution over actions (Stochastic Policy). Policy-gradient methods is a subclass of policy-based methods in which we search directly for the optimal policy. We optimize the parameter directly by performing the gradient ascent on the performance of the objective function.

Reinforce Algorithm

The Reinforce algorithm, also called Monte-Carlo policy-gradient, is a policy-gradient algorithm that uses an estimated return from an entire episode to update the policy parameter $\theta$. The following is an implementation of the Reinforce algorithm in the CartPole-v1 environment.

Results

Mean_Reward: 500 +/- 0.00

replay.mp4

About

This is a trained model of a Reinforce agent playing CartPole-v1

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published