Skip to content

Latest commit

 

History

History
56 lines (42 loc) · 1.66 KB

README.md

File metadata and controls

56 lines (42 loc) · 1.66 KB

Ranking Policy Gradient

Ranking Policy Gradient (RPG) is a sample-efficient off-policy policy gradient method that learns optimal ranking of actions to maximize the return. RPG has the following practical advantages:

  • It is a sample-efficient model-free algorithm for learning deterministic policies.
  • It is effortless to incorporate any exploration algorithm to improve the sample-efficiency of RPG further.

This codebase contains the implementation of RPG using the dopamine framework. The preprint of the RPG paper is available here.

Instructions

Install via source

Step 1.

Follow the install instruction of dopamine framework for Ubuntu or Max OS X.

Step 2.

Download the RPG source, i.e.

git clone [email protected]:illidanlab/rpg.git

Running the tests

cd ./rpg/dopamine 
python -um dopamine.atari.train \
  --agent_name=rpg \
  --base_dir=/tmp/dopamine \
  --random_seed 1 \
  --game_name=Pong \
  --gin_files='dopamine/agents/rpg/configs/rpg.gin'

Reproduce

To reproduce the results in the paper, please refer to the instruction in here.

Reference

If you use this RPG implementation in your work, please consider citing the following papers:

@article{lin2019ranking,
  title={Ranking Policy Gradient},
  author={Lin, Kaixiang and Zhou, Jiayu},
  journal={arXiv preprint arXiv:1906.09674},
  year={2019}
}