RL_collects

experiments of reinforcement learning algorithms

Deep Q-network

Simple version implemented, try it on cartPole game
Further improvement to try:
- Dueling structure of function approximate network based on this paper
- Avoid overestimation by using double Q-learning, based on this paper
- Using prioritzed experience replay.

vallinaPG

Implemented for discrete action space
Basic policy gradient with monte-carlo return
To try 2 forms of Policy gradient theorem, modify Actor._discount_and_norm_rewards()in policy_gradient.py
- Current version: Rewards after action (equivalent one, less variance)
- Uncomment: total rewards of episode (basic version of formula)
Use moving average of episdoes' return as basline
Run run_CartPole.py to play cartPole balancing game in OpenAI gym.

PPO

use clipped surrogate objective
baseline given by state value approximated by critor
based on this paper
Use MAX_STEPS to control the length of episode you want to play (2000 by default)
Note that there exists an upper bound of total rewards due to MAX_STEPS
Next thing to be done: extend this algorithm to
- Multi-process/thread (multi-actor, single critor; less correlation between experiences)
- Continuous action selection model

Test on OpenAI gym games.

Trained agent on LunarLander task. 5000 episodes of training, see hyperparameters in experiment_results/RL_set.

Trained agent on mountain car game, MAX_STEPS set to 5000, hence the longest time step each episode can go on is 5000. See detailed hyperparameters setting in experiment_results/RL_set)

Trained agent on cartPole game, MAX_STEPS set to 5000, hence the maximum total reward can be obtained is 5000. See detailed hyperparameters setting in experiment_results/RL_set)

Name		Name	Last commit message	Last commit date
Latest commit History 60 Commits
Deep Q-network		Deep Q-network
PPO		PPO
demo		demo
experiment_results		experiment_results
slides		slides
vanillaPG		vanillaPG
.DS_Store		.DS_Store
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RL_collects

experiments of reinforcement learning algorithms

Deep Q-network

vallinaPG

PPO

Test on OpenAI gym games.

About

Releases

Packages

Contributors 2

Languages

JamesTuna/RL_collects

Folders and files

Latest commit

History

Repository files navigation

RL_collects

experiments of reinforcement learning algorithms

Deep Q-network

vallinaPG

PPO

Test on OpenAI gym games.

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages