Ranking Policy Gradient

Ranking Policy Gradient (RPG) is a sample-efficient off-policy policy gradient method that learns optimal ranking of actions to maximize the return. RPG has the following practical advantages:

It is a sample-efficient model-free algorithm for learning deterministic policies.
It is effortless to incorporate any exploration algorithm to improve the sample-efficiency of RPG further.

This codebase contains the implementation of RPG using the dopamine framework. The preprint of the RPG paper is available here.

Instructions

Install via source

Step 1.

Follow the install instruction of dopamine framework for Ubuntu or Max OS X.

Step 2.

Download the RPG source, i.e.

git clone git@github.com:illidanlab/rpg.git

Running the tests

cd ./rpg/dopamine 
python -um dopamine.atari.train \
  --agent_name=rpg \
  --base_dir=/tmp/dopamine \
  --random_seed 1 \
  --game_name=Pong \
  --gin_files='dopamine/agents/rpg/configs/rpg.gin'

Reproduce

To reproduce the results in the paper, please refer to the instruction in here.

Reference

If you use this RPG implementation in your work, please consider citing the following papers:

@article{lin2019ranking,
  title={Ranking Policy Gradient},
  author={Lin, Kaixiang and Zhou, Jiayu},
  journal={arXiv preprint arXiv:1906.09674},
  year={2019}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Ranking Policy Gradient

Instructions

Install via source

Step 1.

Step 2.

Running the tests

Reproduce

Reference

Files

README.md

Latest commit

History

README.md

File metadata and controls

Ranking Policy Gradient

Instructions

Install via source

Step 1.

Step 2.

Running the tests

Reproduce

Reference