Code based from Explorer.
- Vanilla Deep Q-learning (VanillaDQN): No target network.
- Deep Q-Learning (DQN)
- Double Deep Q-learning (DDQN)
- Maxmin Deep Q-learning (MaxminDQN)
- Averaged Deep Q-learning (AveragedDQN)
Base Agent
└── Vanilla DQN
├── DQN
| └──DDQN
├── Maxmin DQN
├── Averaged DQN
└── NMIX DQN
- Python (>=3.6)
- PyTorch
- Gym && Gym Games: You may only install part of Gym (
classic_control, box2d
) by commandpip install 'gym[classic_control, box2d]'
. - Optional:
- Gym Atari:
pip install gym[atari,accept-rom-license]
- Gym Mujoco:
- Download MuJoCo version 1.50 from MuJoCo website.
- Unzip the downloaded
mjpro150
directory into~/.mujoco/mjpro150
, and place the activation key (themjkey.txt
file downloaded from here) at~/.mujoco/mjkey.txt
. - Install mujoco-py:
pip install 'mujoco-py<1.50.2,>=1.50.1'
- Install gym[mujoco]:
pip install gym[mujoco]
- PyBullet:
pip install pybullet
- DeepMind Control Suite:
pip install git+git://github.com/denisyarats/dmc2gym.git
- Gym Atari:
- Others: Please check
requirements.txt
.
Environment based from gym-games.
cd gym-games
python setup.py install
All hyperparameters including parameters for grid search are stored in a configuration file in directory configs
. To run an experiment, a configuration index is first used to generate a configuration dict corresponding to this specific configuration index. Then we run an experiment defined by this configuration dict.
For example, run the experiment with configuration file Maxmin_catcher_run.json
and configuration index 1
:
python main.py --config_file ./configs/Maxmin_catcher_run.json --config_idx 1
The models are tested for one episode after every test_per_episodes
training episodes which can be set in the configuration file.
First, we calculate the number of total combinations in a configuration file (e.g. Maxmin_catcher_run.json
):
python utils/sweeper.py
The output will be:
Number of total combinations in Maxmin_catcher_run.json: 55
Then we run through all configuration indexes from 1
to 55
. The simplest way is using a bash script:
for index in {1..55}
do
python main.py --config_file ./configs/Maxmin_catcher_run.json --config_idx $index
done
Parallel is usually a better choice to schedule a large number of jobs:
parallel --eta --ungroup python main.py --config_file ./configs/Maxmin_catcher_run.json --config_idx {1} ::: $(seq 1 55)
For more information, please check the run.sh
file and commands/
folder.