Skip to content

Minimal implementation of clipped objective Proximal Policy Optimization (PPO) in PyTorch

License

Notifications You must be signed in to change notification settings

Jerome-Cong/PPO-PyTorch

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

73 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PPO-PyTorch

Minimal PyTorch implementation of Proximal Policy Optimization with clipped objective for OpenAI gym environments.

Usage

  • To test a preTrained network : run test.py or test_continuous.py
  • To train a new network : run PPO.py or PPO_continuous.py
  • All the hyperparameters are in the PPO.py or PPO_continuous.py file
  • If you are trying to train it on a environment where action dimension = 1, make sure to check the tensor dimensions in the update function of PPO class, since I have used torch.squeeze() quite a few times. torch.squeeze() squeezes the tensor such that there are no dimensions of length = 1 (more info).
  • Number of actors for collecting experience = 1. This could be changed by creating multiple instances of ActorCritic networks in the PPO class and using them to collect experience (like A3C and standard PPO).

Dependencies

Trained and tested on:

Python 3.6
PyTorch 1.0
NumPy 1.15.3
gym 0.10.8
Pillow 5.3.0

Results

PPO Discrete LunarLander-v2 (1200 episodes) PPO Continuous BipedalWalker-v2 (4000 episodes)

References

About

Minimal implementation of clipped objective Proximal Policy Optimization (PPO) in PyTorch

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%