Skip to content

Investigating overestimating Q-values in Q-learning

Notifications You must be signed in to change notification settings

KSB21ST/cs492_project

Repository files navigation

NMIX Q Learning

Code based from Explorer.

Implemented algorithms

The dependency tree of agent classes

Base Agent
  └── Vanilla DQN
        ├── DQN
        |    └──DDQN
        ├── Maxmin DQN
        ├── Averaged DQN
        └── NMIX DQN

Requirements

  • Python (>=3.6)
  • PyTorch
  • Gym && Gym Games: You may only install part of Gym (classic_control, box2d) by command pip install 'gym[classic_control, box2d]'.
  • Optional:
    • Gym Atari: pip install gym[atari,accept-rom-license]
    • Gym Mujoco:
      • Download MuJoCo version 1.50 from MuJoCo website.
      • Unzip the downloaded mjpro150 directory into ~/.mujoco/mjpro150, and place the activation key (the mjkey.txt file downloaded from here) at ~/.mujoco/mjkey.txt.
      • Install mujoco-py: pip install 'mujoco-py<1.50.2,>=1.50.1'
      • Install gym[mujoco]: pip install gym[mujoco]
    • PyBullet: pip install pybullet
    • DeepMind Control Suite: pip install git+git://github.com/denisyarats/dmc2gym.git
  • Others: Please check requirements.txt.

Experiments

Environment installation

Environment based from gym-games.

cd gym-games
python setup.py install

Train && Test

All hyperparameters including parameters for grid search are stored in a configuration file in directory configs. To run an experiment, a configuration index is first used to generate a configuration dict corresponding to this specific configuration index. Then we run an experiment defined by this configuration dict.

For example, run the experiment with configuration file Maxmin_catcher_run.json and configuration index 1:

python main.py --config_file ./configs/Maxmin_catcher_run.json --config_idx 1

The models are tested for one episode after every test_per_episodes training episodes which can be set in the configuration file.

Grid Search (Optional)

First, we calculate the number of total combinations in a configuration file (e.g. Maxmin_catcher_run.json):

python utils/sweeper.py

The output will be:

Number of total combinations in Maxmin_catcher_run.json: 55

Then we run through all configuration indexes from 1 to 55. The simplest way is using a bash script:

for index in {1..55}
do
  python main.py --config_file ./configs/Maxmin_catcher_run.json --config_idx $index
done

Parallel is usually a better choice to schedule a large number of jobs:

parallel --eta --ungroup python main.py --config_file ./configs/Maxmin_catcher_run.json --config_idx {1} ::: $(seq 1 55)

For more information, please check the run.sh file and commands/ folder.

Acknowledgements

About

Investigating overestimating Q-values in Q-learning

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •