Overview

This document explain the structure of this codebase and hyperparameters of experiments.

File organization

Step 1.

Please refer to the instruction of dopamine structure in here

Step 2.

We add variants of RPG agents in this folder and we explain each agent as follows:

Folder	Exploration	Supervision
rpg	epsilon-greedy	RPG (Hinge loss)
lpg	epsilon-greedy	LPG (Cross-Entropy)
epg	EPG	LPG (Cross-Entropy)
repg	EPG	RPG (Hinge loss)
implicit_quantilerpg	implicit_quantile	RPG (Hinge loss)

EPG: EPG is the stochastic listwise policy gradient with off-policy supervised learning, which is the vanilla policy gradient trained with off-policy supervised learning. The exploration and supervision agent is parameterized by the same neural network. The supervision agent minimizes the cross-entropy loss over the near-optimal trajectories collected in an online fashion.
LPG: LPG is the deterministic listwise policy gradient with off-policy supervised learning. We choose an action greedily based on the value of logits during the evaluation, and it stochastically explores the environment as EPG.
RPG: RPG explores the environment using a separate agent: epsilon-greedy, EPG in Pong and Implicit Quantile in other games. Then rpg conducts supervised learning by minimizing the hinge loss.

In this codebase, the folder rpg contain the code of RPG with epsilon-greedy exploration, and similarly repg for EPG exploration, implicit_quantilerpg for implicit quantile network exploration.

The agents with relatively simple exploration strategy (rpg, lpg, epg, repg) perform well on Pong, comparing to the state-of-the-arts, since there are higher chance to hit the good trajectories with in Pong. For more complicated games, we adopt implicit quantile network as the exploration agent.

Hyperparameters

The hyperparameters of networks, optimizers, etc., are same as the baselines in dopamine. The trajectory reward threshold c (see Def 5 in the paper) for each game is given as follows:

game	c
Boxing	100
Breakout	400
Bowling	80
BankHeist	1100
DoubleDunk	18
Pitfall	0
Pong	1
Robotank	65

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!