This is an original PyTorch implementation of incorporating temporally persistent exploration in DrQ-v2 from
State-Novelty Guided Action Persistence in Deep Reinforcement Learning by Jianshu Hu, Paul Weng and Yutong Ban.
We implement State-Novelty guided adaptive Action Persistence (SNAP) based on DrQv2.
Install MuJoCo if it is not already the case:
- Obtain a license on the MuJoCo website.
- Download MuJoCo binaries here.
- Unzip the downloaded archive into
~/.mujoco/mujoco200and place your license key filemjkey.txtat~/.mujoco. - Use the env variables
MUJOCO_PY_MJKEY_PATHandMUJOCO_PY_MUJOCO_PATHto specify the MuJoCo license key path and the MuJoCo directory path. - Append the MuJoCo subdirectory bin path into the env variable
LD_LIBRARY_PATH.
Install the following libraries:
sudo apt update
sudo apt install libosmesa6-dev libgl1-mesa-glx libglfw3Install dependencies:
conda env create -f conda_env.yml
conda activate drqv2Train the agent with original DrQv2:
python train.py task=quadruped_walkTrain the agent with SNAP:
python train.py task=quadruped_walk repeat_type=1 action_repeat=1 update_every_steps=4 nstep=6Monitor results:
tensorboard --logdir exp_localThe majority of this code is licensed under the MIT license, however portions of the project are available under separate license terms: DeepMind is licensed under the Apache 2.0 license.