Skip to content

experiments of reinforcement learning algorithms

Notifications You must be signed in to change notification settings

JamesTuna/RL_collects

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

60 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RL_collects

experiments of reinforcement learning algorithms

Deep Q-network

  • Simple version implemented, try it on cartPole game
  • Further improvement to try:
    • Dueling structure of function approximate network based on this paper
    • Avoid overestimation by using double Q-learning, based on this paper
    • Using prioritzed experience replay.

vallinaPG

  • Implemented for discrete action space
  • Basic policy gradient with monte-carlo return
  • To try 2 forms of Policy gradient theorem, modify Actor._discount_and_norm_rewards()in policy_gradient.py
    • Current version: Rewards after action (equivalent one, less variance)
    • Uncomment: total rewards of episode (basic version of formula)
  • Use moving average of episdoes' return as basline
  • Run run_CartPole.py to play cartPole balancing game in OpenAI gym.

PPO

  • use clipped surrogate objective
  • baseline given by state value approximated by critor
  • based on this paper
  • Use MAX_STEPS to control the length of episode you want to play (2000 by default)
  • Note that there exists an upper bound of total rewards due to MAX_STEPS
  • Next thing to be done: extend this algorithm to
    • Multi-process/thread (multi-actor, single critor; less correlation between experiences)
    • Continuous action selection model

Test on OpenAI gym games.

landerdemo
Training Curve2
Trained agent on LunarLander task. 5000 episodes of training, see hyperparameters in experiment_results/RL_set.

mtcar
Training Curve2
Trained agent on mountain car game, MAX_STEPS set to 5000, hence the longest time step each episode can go on is 5000. See detailed hyperparameters setting in experiment_results/RL_set)

cartpole
Training Curve Trained agent on cartPole game, MAX_STEPS set to 5000, hence the maximum total reward can be obtained is 5000. See detailed hyperparameters setting in experiment_results/RL_set)

About

experiments of reinforcement learning algorithms

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published