The environment to the BipedalWalker is described here.
The video shows in the first part the behaviour of the untrained agent and then in comparison the behaviour of the trained agent.
My learning algorithm is a Twin Delayed Deep Deterministic Policy Gradient algorithm (TD3).
start Jupyter Notebook BipedalWalker-v2.ipynb
and follow the instructions.