Below is a gif of one evaluation episode in the Lunar Lander v2 environment using my implementation of PPO.
- Experiments done using
Python 3.11.9
and miniconda env. - I currently have
gymnasium==0.29.1
.- In the
v1.0
version,gym.vector.make
will be replaced bygym.make_vec
. - The current version is still widely supported though, as per gymnasium's docs, so I'll stick to it.
- In the
List of algos:
- Bandits.
- N-step Q-learning.
- Dyna-Q.
- PPO.
- DQN.