Skip to content

iterated Q-Network [TMLR 25] learns several Bellman iterations in parallel instead of learning them sequentially via repeated target updates ✨ This directly translates to performance improvements on the Atari and MuJoCo benchmarks 🚀

License

Notifications You must be signed in to change notification settings

theovincent/i-DQN

Repository files navigation

Official implementation of iterated Deep $Q$-Network (i-DQN) in JAX

custom_badge custom_badge custom_badge

i-DQN and i-IQN consider multiple consecutive $Q$-functions in the loss, where each $Q$-function is learned as the projected Bellman iteration of the following one (see on the left). This directly translates to performance increase as the usual $Q$-Network approaches learn only one projected Bellman iteration at a time (see on the right).

drawing drawing

NB: For simplicity, this branch does not support parameter sharing between the networks. Checkout the branch experiments for shared network parameters.

User installation

We recommend using Python 3.11.5. In the folder where the code is, create a Python virtual environment, activate it, update pip and install the package and its dependencies in editable mode:

python3 -m venv env
source env/bin/activate
pip install --upgrade pip setuptools wheel
pip install -e .[dev,gpu]

To verify the installation, run the tests as:pytest

Running code

To launch experiments locally, run:

launch_job/[enviroment]/local_[algorithm].sh [ARGS]

This will launch an experiment in a tmux terminal. It will push the logs and performances to a WandB project called "i-dqn". The list of hyperparameters is available at experiments/base/parser_argument.py.

For example:

launch_job/lunar_lander/local_idqn.sh --experiment_name K3_T200_D10 --first_seed 1 --last_seed 1 --n_networks 3 --target_update_frequency 200 --target_sync_frequency 10 

will run i-DQN with $K=3$, $T=200$, and $D=10$ for the seed $1$ on Lunar Lander.

Citing iterated Q-Network

@article{vincent2024iterated,
  title={Iterated $ Q $-Network: Beyond the One-Step Bellman Operator},
  author={Vincent, Th{\'e}o and Palenicek, Daniel and Belousov, Boris and Peters, Jan and D'Eramo, Carlo},
  journal={Transactions on Machine Learning Research},
  year={2025}
}

About

iterated Q-Network [TMLR 25] learns several Bellman iterations in parallel instead of learning them sequentially via repeated target updates ✨ This directly translates to performance improvements on the Atari and MuJoCo benchmarks 🚀

Topics

Resources

License

Stars

Watchers

Forks