Skip to content

Latest commit

 

History

History
13 lines (9 loc) · 542 Bytes

README.md

File metadata and controls

13 lines (9 loc) · 542 Bytes

Twin Delayed DDPG

TD3 can be seen as an improved version of DDPG, which utilizes clipped double q-learning, meaning that it learns two action value functions instead of one.

Also, the actor updates are delayed (updates are less frequent than the critic updates).

The result of trained DDPG agent after 500 episodes for HalfCheetah environment.
The result of trained DDPG agent after 500 episodes for Pendulum environment.