The environment to the Humanoid is described here.
The video shows in the first part the behaviour of the untrained agent and then in comparison the behaviour of the trained agent.
My learning algorithm is a Proximal Policy Optimization(PPO).
start Jupyter Notebook HumanoidPyBulletEnv-v0.ipynb
and follow the instructions.