Skip to content
/ A2PR Public

Implementation of A2PR, a simple way to achieve SOTA in offline reinforcement learning with an adaptive advantage-guided policy regularization method, in Pytorch

License

Notifications You must be signed in to change notification settings

ltlhuuu/A2PR

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Adaptive Advantage-guided Policy Regularization for Offline Reinforcement Learning

arXiv

This repo is the official implementation of ICML'24 paper "Adaptive Advantage-guided Policy Regularization for Offline Reinforcement Learning".

If you find this repository useful for your research, please cite:

@inproceedings{
    A2PR,
    title={Adaptive Advantage-guided Policy Regularization for Offline Reinforcement Learning},
    author={Tenglong Liu and Yang Li and Yixing Lan and Hao Gao and Wei Pan and Xin Xu},
    booktitle={International Conference on Machine Learning},
    year={2024}
}

Contents

Quick start

Clone this repository and navigate to A2PR folder.

git clone https://github.com/ltlhuuu/A2PR.git
cd A2PR

Install dependency

Environment configuration and dependencies are available in environment.yaml and requirements.txt.

First, create the conda environment.

conda env create -f environment.yaml
conda activate A2PR

Then install the remaining requirements (with MuJoCo already downloaded, if not see here):

pip install -r requirements.txt

Install the D4RL benchmark

git clone https://github.com/Farama-Foundation/D4RL.git
cd d4rl
pip install -e .

MuJoCo installation

Download MuJoCo:

mkdir ~/.mujoco
cd ~/.mujoco
wget https://github.com/google-deepmind/mujoco/releases/download/2.1.0/mujoco210-linux-x86_64.tar.gz
tar -zxvf mujoco210-linux-x86_64.tar.gz
cd mujoco210
wget https://www.roboti.us/file/mjkey.txt

Then add the following line to .bashrc:

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:~/.mujoco/mujoco210/bin

Run experiments

In the following, you can use the illustrative examples to run the experiments.

python main.py --env_id halfcheetah-medium-v2 --seed 0 --alpha 40.0 --vae_weight 1.0 --device cuda:0 --mask 1.0 --discount 0.99

python main.py --env_id hopper-medium-v2 --seed 0 --alpha 2.5 --vae_weight 1.0 --device cuda:0 --mask 0.4 --discount 0.995

python main.py --env_id walker2d-medium-v2 --seed 0 --alpha 2.5 --vae_weight 1.5 --device cuda:0 --mask 1.0 --discount 0.99

See result

tensorboard --logdir='Your output path'

For example

tensorboard --logdir=result

About

Implementation of A2PR, a simple way to achieve SOTA in offline reinforcement learning with an adaptive advantage-guided policy regularization method, in Pytorch

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages