PyTorch implementation of Twin Delayed Deep Deterministic Policy Gradients (TD3) with a generative replay component.
The code is heavily modified to work for my research needs
Method is tested on MuJoCo continuous control tasks in OpenAI gym. Networks are trained using PyTorch 1.7 and Python 3.8.