Based on PARL, the TD3 algorithm of deep reinforcement learning has been reproduced, reaching the same level of indicators as the paper in Mujoco benchmarks.
Include following improvements:
- Clipped Double Q-learning
- Target Networks and Delayed Policy Update
- Target Policy Smoothing Regularization
TD3 in Addressing Function Approximation Error in Actor-Critic Methods
PARL currently supports the open-source version of Mujoco provided by DeepMind, so users do not need to download binaries of Mujoco as well as install mujoco-py and get license. For more details, please visit Mujoco
+ Each experiment was run three times with different seeds- python3.7+
- parl>=2.1.1
- paddlepaddle>=2.0.0
- gym>=0.26.0
- mujoco>=2.2.2
# To train an agent for HalfCheetah-v4 game
python train.py
# To train for different game
python train.py --env [ENV_NAME]