RL A3C Pytorch

This repository includes my implementation with reinforcement learning using Asynchronous Advantage Actor-Critic (A3C) in Pytorch an algorithm from Google Deep Mind's paper "Asynchronous Methods for Deep Reinforcement Learning."

A3C LSTM

I implemented an A3C LSTM model and trained it in the atari 2600 environments provided in the Openai Gym. So far model currently has shown the best prerfomance I have seen for atari game environments. Included in repo are trained models for SpaceInvaders-v0, MsPacman-v0, Breakout-v0, BeamRider-v0, Pong-v0, Seaquest-v0 and Asteroids-v0 which have had very good performance and currently hold the best scores on openai gym leaderboard for each of those games(No plans on training model for any more atari games right now...). Saved models in trained_models folder. Trained models may not run properly if you have recently updated gym and atari-py and have v4 atari versions as changes had effects on all other versions of atari games from update as well. To make sure they run properly u need to keep gym <= 0.8.2 and atari-py <= 0.0.21

Have optimizers using shared statistics for RMSProp and Adam available for use in training as well option to use non shared optimizer.

Gym atari settings are more difficult to train than traditional ALE atari settings as Gym uses stochastic frame skipping and has higher number of discrete actions. Such as Breakout-v0 has 6 discrete actions in Gym but ALE is set to only 4 discrete actions. Also in GYM atari they randomly repeat the previous action with probability 0.25 and there is time/step limit that limits performance.

link to the Gym environment evaluations below

Tables	Best 100 episode Avg	Best Score
SpaceInvaders-v0	5808.45 ± 337.28	13380.0
SpaceInvaders-v3	6944.85 ± 409.60	20440.0
SpaceInvadersDeterministic-v3	79060.10 ± 5826.59	167330.0
Breakout-v0	739.30 ± 18.43	864.0
Breakout-v3	859.57 ± 1.97	864.0
Pong-v0	20.96 ± 0.02	21.0
PongDeterministic-v3	21.00 ± 0.00	21.0
BeamRider-v0	8441.22 ± 221.24	13130.0
MsPacman-v0	6323.01 ± 116.91	10181.0
Seaquest-v0	54203.50 ± 1509.85	88840.0

The 167,330 Space Invaders score is World Record Space Invaders score and game ended only due to GYM timestep limit and not from loss of life

Requirements

Python 2.7+
Openai Gym and Universe
Pytorch

Training

When training model it is important to limit number of worker threads to number of cpu cores available as too many threads (e.g. more than one thread per cpu core available) will actually be detrimental in training speed and effectiveness

To train agent in Pong-v0 environment with 32 different worker threads:

python main.py --env Pong-v0 --workers 32

Hit Ctrl C to end training session properly

Evaluation

To run a 100 episode gym evaluation with trained model

python gym_eval.py --env Pong-v0 --num-episodes 100

Project Reference

https://github.com/ikostrikov/pytorch-a3c

Name		Name	Last commit message	Last commit date
Latest commit History 99 Commits
demo		demo
logs		logs
trained_models		trained_models
.gitignore		.gitignore
LICENSE.MD		LICENSE.MD
README.MD		README.MD
config.json		config.json
environment.py		environment.py
gym_eval.py		gym_eval.py
main.py		main.py
model.py		model.py
player_util.py		player_util.py
shared_optim.py		shared_optim.py
test.py		test.py
train.py		train.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RL A3C Pytorch

A3C LSTM

Requirements

Training

Evaluation

Project Reference

About

Releases

Packages

Languages

License

LaoKpa/rl_a3c_pytorch

Folders and files

Latest commit

History

Repository files navigation

RL A3C Pytorch

A3C LSTM

Requirements

Training

Evaluation

Project Reference

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages