Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
UPDATE README
  • Loading branch information
ramanuzan authored and GitHub Enterprise committed Oct 21, 2021
1 parent 4a1dae4 commit 616b9ef
Show file tree
Hide file tree
Showing 165 changed files with 738 additions and 446 deletions.
32 changes: 20 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,6 @@
- Benchmark of the algorithms is conducted in many RL environment



## :arrow_down: Installation

```
Expand All @@ -30,7 +29,7 @@

## :rocket: QuickStart

<img src="./img/quickstart.png" alt="quickstart" width=60%/>
<img src="./resrc/quickstart.png" alt="quickstart" width=60%/>



Expand All @@ -42,24 +41,33 @@



## :mag: How to

- [How to use](./docs/How_to_use.md)
- [How to customize config](./config/README.md)
- [How to customize agent](./core/agent/README.md)
- [How to customize environment](./core/env/README.md)
- [How to customize network](./core/network/README.md)
- [How to customize buffer](./core/buffer/README.md)



## :page_facing_up: Documentation

- [Implementation List](https://github.kakaocorp.com/leonard-q/RL_Algorithms/blob/master/docs/Implementation_list.md)
- [Benchmark](https://github.kakaocorp.com/leonard-q/RL_Algorithms/blob/master/docs/Benchmark.md)
- [Distributed Architecture](https://github.kakaocorp.com/leonard-q/RL_Algorithms/blob/master/docs/Distributed_Architecture.md)
- [Reference](https://github.kakaocorp.com/leonard-q/RL_Algorithms/blob/master/docs/Reference.md)
- [Distributed Architecture](./docs/Distributed_Architecture.md)
- [Role of Managers](./manager/README.md)
- [Implementation List](./docs/Implementation_list.md)
- [Naming Convention](./docs/Naming_convention.md)
- [Benchmark](https://www.notion.so/rlnote/Benchmark-c7642d152cad4980bc03fe804fe9e88a)
- [Reference](./docs/Reference.md)


- [How to use](https://github.kakaocorp.com/leonard-q/RL_Algorithms/blob/master/docs/How_to_use.md)
- [How to add RL algorithm](https://github.kakaocorp.com/leonard-q/RL_Algorithms/blob/master/docs/How_to_add_rl_algorithm.md)
- [How to add environment](https://github.kakaocorp.com/leonard-q/RL_Algorithms/blob/master/docs/How_to_add_environment.md)
- [How to add network](https://github.kakaocorp.com/leonard-q/RL_Algorithms/blob/master/docs/How_to_add_network.md)

## :busts_in_silhouette: Contributors

:mailbox: Contact: [Leonard.Q](leonard.q@kakaoenterprise.com), [Ramanuzan.Lee]([email protected]), [Royce.Choi]([email protected])
:mailbox: Contact: atech.rl@kakaocorp.com

<img src="./img/contributors.png" alt="contributors" width=80%/>
<img src="./resrc/contributors.png" alt="contributors" width=80%/>


## :copyright: License
Expand Down
4 changes: 2 additions & 2 deletions async_distributed_train.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,15 +37,15 @@
path_queue = mp.Queue(1)

record_period = config.train.record_period if config.train.record_period else config.train.run_step//10
test_manager_config = (Env(**config.env), config.train.test_iteration, config.train.record, record_period)
eval_manager_config = (Env(**config.env), config.train.eval_iteration, config.train.record, record_period)
log_id = config.train.id if config.train.id else config.agent.name
log_manager_config = (config.env.name, log_id, config.train.experiment)
agent_config['device'] = "cpu"
manage = mp.Process(target=manage_process,
args=(Agent, agent_config,
result_queue, manage_sync_queue, path_queue,
config.train.run_step, config.train.print_period,
MetricManager, TestManager, test_manager_config,
MetricManager, EvalManager, eval_manager_config,
LogManager, log_manager_config, config_manager))
distributed_manager_config = (Env, config.env, Agent, agent_config, config.train.num_workers, 'async')
interact = mp.Process(target=interact_process,
Expand Down
43 changes: 43 additions & 0 deletions config/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
# How to customize config

## Config file management rules
- The config file provided by default is mainly managed in the form of config/\[agent\]/\[env\].py.
- For a specific environment group that shares parameters, manage it in the form of config/\[agent\]/\[env_group\], and specify the environment name with --env.name in the run command.

reference: [dqn/cartpole.py](./dqn/cartpole.py), [dqn/atari.py](./dqn/atari.py)

## Config setting
- The config file is managed with a total of four dictionary variables: agent, env, optim, and train.

### agent
- The agent dictionary manages input parameters used by the agent class.
- name: The key of the agent class you want to use.
- others: You can check it in the agent class.

### env
- The env dictionary manages input parameters used by the env class.
- name: The key of the env class you want to use.
- others: You can check it in the env class.

### optim
- The optim dictionary manages input parameters used by the optimizer class. Since the optimizer of pytorch is used as it is, any optimizer supported by pytorch can be used.
- name: The key of the optimizer class you want to use.
- others: You can check it in the optimizer class supported by pytorch.

### train
- The optim dictionary manages parameters used in the main script.
- training: It means whether to learn. Set to False in the eval.py script and True otherwise.
- load_path: It means the path to load the model. If you want to load the model or in the eval.py script, you need to set it. If not, set it None.
- run_step: It determines the total number of interactions to proceed.
- print_period: It means the cycle(unit=step) to print the progress.
- save_period: It means the cycle(unit=step) to save the model.
- eval_iteration: It means how many episodes will be run in total to get the evaluation score.
- record: It means whether to record the simulation as the evaluation proceeds. If you set it True, simulation is saved as a gif file in save_path. If you set it True and env is recordable, simulation is saved as a gif file in save_path. (Note that this does not work for non-recordable environments.)
- record_period: It means the cycle(unit=step) to record.
- distributed_batch_size: In distributed script, uses distributed_batch_size instead of agent.batch_size.
- update_period: It means the cycle(unit=step) in which actors pass transition data to learner.
- num_workers: Total number of distributed actors which interact with env.

__distributed_batch_size, update_period and num_workers are only used in distributed scripts.__

reference: [ppo/atari.py](./ppo/atari.py)
2 changes: 1 addition & 1 deletion config/ape_x/atari.py
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@
"run_step" : 30000000,
"print_period" : 10000,
"save_period" : 100000,
"test_iteration": 5,
"eval_iteration": 5,
"record" : True,
"record_period" : 300000,
# distributed setting
Expand Down
2 changes: 1 addition & 1 deletion config/ape_x/cartpole.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@
"run_step" : 100000,
"print_period" : 1000,
"save_period" : 10000,
"test_iteration": 10,
"eval_iteration": 10,
# distributed setting
"distributed_batch_size" : 512,
"update_period" : 16,
Expand Down
2 changes: 1 addition & 1 deletion config/ape_x/pong_mlagent.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@
"run_step" : 200000,
"print_period" : 5000,
"save_period" : 50000,
"test_iteration": 10,
"eval_iteration": 10,
# distributed setting
"distributed_batch_size" : 512,
"update_period" : 16,
Expand Down
2 changes: 1 addition & 1 deletion config/ape_x/procgen.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@
"run_step" : 30000000,
"print_period" : 10000,
"save_period" : 100000,
"test_iteration": 5,
"eval_iteration": 5,
"record" : True,
"record_period" : 300000,
# distributed setting
Expand Down
2 changes: 1 addition & 1 deletion config/c51/atari.py
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@
"run_step" : 30000000,
"print_period" : 10000,
"save_period" : 100000,
"test_iteration": 5,
"eval_iteration": 5,
"record" : True,
"record_period" : 300000,
# distributed setting
Expand Down
2 changes: 1 addition & 1 deletion config/c51/cartpole.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@
"run_step" : 100000,
"print_period" : 1000,
"save_period" : 10000,
"test_iteration": 5,
"eval_iteration": 5,
# distributed setting
"update_period" : 32,
"num_workers" : 8,
Expand Down
2 changes: 1 addition & 1 deletion config/c51/pong_mlagent.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@
"run_step" : 200000,
"print_period" : 5000,
"save_period" : 50000,
"test_iteration": 10,
"eval_iteration": 10,
# distributed setting
"update_period" : 8,
"num_workers" : 16,
Expand Down
2 changes: 1 addition & 1 deletion config/c51/procgen.py
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@
"run_step" : 30000000,
"print_period" : 10000,
"save_period" : 100000,
"test_iteration": 5,
"eval_iteration": 5,
"record" : True,
"record_period" : 300000,
# distributed setting
Expand Down
2 changes: 1 addition & 1 deletion config/ddpg/cartpole.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@
"run_step" : 100000,
"print_period" : 1000,
"save_period" : 10000,
"test_iteration": 10,
"eval_iteration": 10,
# distributed setting
"update_period": 1,
"num_workers": 8,
Expand Down
2 changes: 1 addition & 1 deletion config/ddpg/hopper_mlagent.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@
"run_step" : 300000,
"print_period" : 5000,
"save_period" : 10000,
"test_iteration" : 10,
"eval_iteration" : 10,
# distributed setting
"distributed_batch_size" : 256,
"update_period" : 1,
Expand Down
2 changes: 1 addition & 1 deletion config/ddpg/pendulum.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@
"run_step" : 100000,
"print_period" : 1000,
"save_period" : 10000,
"test_iteration": 10,
"eval_iteration": 10,
# distributed setting
"distributed_batch_size" : 128,
"update_period" : 1,
Expand Down
2 changes: 1 addition & 1 deletion config/double/atari.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@
"run_step" : 30000000,
"print_period" : 10000,
"save_period" : 100000,
"test_iteration": 5,
"eval_iteration": 5,
"record" : True,
"record_period" : 300000,
# distributed setting
Expand Down
2 changes: 1 addition & 1 deletion config/double/cartpole.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@
"run_step" : 100000,
"print_period" : 1000,
"save_period" : 10000,
"test_iteration": 5,
"eval_iteration": 5,
# distributed setting
"update_period" : 32,
"num_workers" : 8,
Expand Down
2 changes: 1 addition & 1 deletion config/double/pong_mlagent.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@
"run_step" : 200000,
"print_period" : 2000,
"save_period" : 50000,
"test_iteration": 10,
"eval_iteration": 10,
# distributed setting
"update_period" : 8,
"num_workers" : 16,
Expand Down
2 changes: 1 addition & 1 deletion config/double/procgen.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@
"run_step" : 30000000,
"print_period" : 10000,
"save_period" : 100000,
"test_iteration": 5,
"eval_iteration": 5,
"record" : True,
"record_period" : 300000,
# distributed setting
Expand Down
2 changes: 1 addition & 1 deletion config/dqn/atari.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@
"run_step" : 30000000,
"print_period" : 10000,
"save_period" : 100000,
"test_iteration": 5,
"eval_iteration": 5,
"record" : True,
"record_period" : 300000,
# distributed setting
Expand Down
2 changes: 1 addition & 1 deletion config/dqn/cartpole.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@
"run_step" : 100000,
"print_period" : 1000,
"save_period" : 10000,
"test_iteration": 10,
"eval_iteration": 10,
# distributed setting
"update_period" : 32,
"num_workers" : 8,
Expand Down
2 changes: 1 addition & 1 deletion config/dqn/mario.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@
"run_step" : 100000000,
"print_period" : 5000,
"save_period" : 50000,
"test_iteration": 1,
"eval_iteration": 1,
"record" : True,
"record_period" : 200000,
# distributed setting
Expand Down
2 changes: 1 addition & 1 deletion config/dqn/pong_mlagent.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@
"run_step" : 200000,
"print_period" : 5000,
"save_period" : 50000,
"test_iteration": 10,
"eval_iteration": 10,
# distributed setting
"update_period" : 8,
"num_workers" : 16,
Expand Down
2 changes: 1 addition & 1 deletion config/dqn/procgen.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@
"run_step" : 30000000,
"print_period" : 10000,
"save_period" : 100000,
"test_iteration": 5,
"eval_iteration": 5,
"record" : True,
"record_period" : 300000,
# distributed setting
Expand Down
2 changes: 1 addition & 1 deletion config/dueling/atari.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@
"run_step" : 30000000,
"print_period" : 10000,
"save_period" : 100000,
"test_iteration": 5,
"eval_iteration": 5,
"record" : True,
"record_period" : 300000,
# distributed setting
Expand Down
2 changes: 1 addition & 1 deletion config/dueling/cartpole.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@
"run_step" : 100000,
"print_period" : 1000,
"save_period" : 10000,
"test_iteration": 5,
"eval_iteration": 5,
# distributed setting
"update_period" : 32,
"num_workers" : 8,
Expand Down
2 changes: 1 addition & 1 deletion config/dueling/pong_mlagent.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@
"run_step" : 200000,
"print_period" : 5000,
"save_period" : 50000,
"test_iteration": 10,
"eval_iteration": 10,
# distributed setting
"update_period" : 8,
"num_workers" : 16,
Expand Down
2 changes: 1 addition & 1 deletion config/dueling/procgen.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@
"run_step" : 30000000,
"print_period" : 10000,
"save_period" : 100000,
"test_iteration": 5,
"eval_iteration": 5,
"record" : True,
"record_period" : 300000,
# distributed setting
Expand Down
2 changes: 1 addition & 1 deletion config/icm_ppo/atari.py
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@
"run_step" : 30000000,
"print_period" : 10000,
"save_period" : 100000,
"test_iteration": 5,
"eval_iteration": 5,
"record" : True,
"record_period" : 300000,
# distributed setting
Expand Down
2 changes: 1 addition & 1 deletion config/icm_ppo/cartpole.py
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@
"run_step" : 100000,
"print_period" : 1000,
"save_period" : 10000,
"test_iteration": 10,
"eval_iteration": 10,
# distributed setting
"distributed_batch_size" : 256,
"update_period" : agent["n_step"],
Expand Down
2 changes: 1 addition & 1 deletion config/icm_ppo/mario.py
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@
"run_step" : 30000000,
"print_period" : 10000,
"save_period" : 500000,
"test_iteration": 1,
"eval_iteration": 1,
"record": True,
"record_period": 500000,
# distributed setting
Expand Down
2 changes: 1 addition & 1 deletion config/icm_ppo/procgen.py
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@
"run_step" : 30000000,
"print_period" : 10000,
"save_period" : 100000,
"test_iteration": 5,
"eval_iteration": 5,
"record" : True,
"record_period" : 300000,
# distributed setting
Expand Down
2 changes: 1 addition & 1 deletion config/iqn/atari.py
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@
"run_step" : 30000000,
"print_period" : 10000,
"save_period" : 100000,
"test_iteration": 5,
"eval_iteration": 5,
"record" : True,
"record_period" : 300000,
# distributed setting
Expand Down
2 changes: 1 addition & 1 deletion config/iqn/cartpole.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@
"run_step" : 100000,
"print_period" : 1000,
"save_period" : 10000,
"test_iteration": 5,
"eval_iteration": 5,
# distributed setting
"update_period" : 32,
"num_workers" : 8,
Expand Down
Loading

0 comments on commit 616b9ef

Please sign in to comment.