Skip to content

Latest commit

 

History

History
17 lines (15 loc) · 739 Bytes

README.md

File metadata and controls

17 lines (15 loc) · 739 Bytes

RL_collects

experiments of reinforcement learning algorithms

vallinaPG

  • Implemented for discrete action space
  • Basic policy gradient with monte-carlo return
  • To try 2 forms of Policy gradient theorem, modify Actor._discount_and_norm_rewards()in policy_gradient.py
    • Current version: Rewards after action (equivalent one, less variance)
    • Uncomment: total rewards of episode (basic version of formula)
  • Use moving average of episdoes' return as basline
  • Run run_CartPole.py to play cartPole balancing game in OpenAI gym.

PPO

  • use clipped surrogate objective
  • baseline given by state value approximated by critor
  • PPO class to be implemented
  • based on this paper