RL_collects

experiments of reinforcement learning algorithms

Implemented for discrete action space
Basic policy gradient with monte-carlo return
To try 2 forms of Policy gradient theorem, modify Actor._discount_and_norm_rewards()in policy_gradient.py
- Current version: Rewards after action (equivalent one, less variance)
- Uncomment: total rewards of episode (basic version of formula)
Use moving average of episdoes' return as basline
Run run_CartPole.py to play cartPole balancing game in OpenAI gym.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
PPO		PPO
vallinaPG		vallinaPG
README.md		README.md