Skip to content

Commit 1b6d817

Browse files
vwxyzjnclin0
andauthored
Prototype Documentation Site (vwxyzjn#64)
* Prototype documentation site * add github docs workflow * update * update documentation site * add domain * push last changes * modified theme * update mkdocs.yml * search content * add layout * update docs * add new stuff * update documentation * update basic usage * add back custom css and js * update CSS * create footer * h2 heading for table of contents * change for nav * add image alignment and image captions * hyperlink update * Update docs * add DQN docs * add ppo docs * add docs * update docs for PPO * add docs * update docs * Cloud Integration Table of contents * Cloud Intergration Table of contents * link cloud intergration * update cloud intergration link * contribution page links update * each example * update links * finish documentation on save and resume * add cloud integration docs * fix * quick fix * fix docs dependencies Co-authored-by: siyi lin <[email protected]>
1 parent cf1bb60 commit 1b6d817

35 files changed

+1558
-12
lines changed

.github/workflows/docs.yaml

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
name: ci
2+
on:
3+
push:
4+
branches:
5+
- master
6+
- main
7+
- documentation-site
8+
jobs:
9+
deploy:
10+
runs-on: ubuntu-latest
11+
steps:
12+
- uses: actions/checkout@v2
13+
- uses: actions/setup-python@v2
14+
with:
15+
python-version: 3.x
16+
- run: pip install mkdocs-material
17+
- run: mkdocs gh-deploy --force

docs/CNAME

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
docs.cleanrl.dev

docs/advanced/resume-training.md

Lines changed: 76 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,76 @@
1+
# Resume Training
2+
3+
4+
A common question we get asked is how to set up model checkpoints to continue training. In this document, we take this [PPO example](https://github.com/vwxyzjn/gym-microrts/blob/master/experiments/ppo_gridnet.py) to explain that question.
5+
6+
## Save model checkpoints
7+
8+
The first step is to save models periodically. By default, we save the model to `wandb`.
9+
10+
```python linenums="1" hl_lines="3 4 6 9-14"
11+
num_updates = args.total_timesteps // args.batch_size
12+
13+
CHECKPOINT_FREQUENCY = 50
14+
starting_update = 1
15+
16+
for update in range(starting_update, num_updates + 1):
17+
# ... do rollouts and train models
18+
19+
if args.track:
20+
# make sure to tune `CHECKPOINT_FREQUENCY`
21+
# so models are not saved too frequently
22+
if update % CHECKPOINT_FREQUENCY == 0:
23+
torch.save(agent.state_dict(), f"{wandb.run.dir}/agent.pt")
24+
wandb.save(f"{wandb.run.dir}/agent.pt", policy="now")
25+
```
26+
27+
Then we could run the following to train our agents
28+
29+
```
30+
python ppo_gridnet.py --prod-mode --capture-video
31+
```
32+
33+
If the training was terminated early, we can still see the last updated model `agent.pt` in W&B like in this URL [https://wandb.ai/costa-huang/cleanRL/runs/21421tda/files](https://wandb.ai/costa-huang/cleanRL/runs/21421tda/files) or as follows
34+
35+
<iframe src="https://wandb.ai/costa-huang/cleanRL/runs/21421tda/files" style="width:100%; height:500px" title="CleanRL CartPole-v1 Example"></iframe>
36+
37+
38+
## Resume training
39+
40+
The second step is to automatically download the `agent.pt` from the URL above and resume training as follows:
41+
42+
43+
```python linenums="1" hl_lines="6-16"
44+
num_updates = args.total_timesteps // args.batch_size
45+
46+
CHECKPOINT_FREQUENCY = 50
47+
starting_update = 1
48+
49+
if args.track and wandb.run.resumed:
50+
starting_update = run.summary.get("charts/update") + 1
51+
global_step = starting_update * args.batch_size
52+
api = wandb.Api()
53+
run = api.run(f"{run.entity}/{run.project}/{run.id}")
54+
model = run.file("agent.pt")
55+
model.download(f"models/{experiment_name}/")
56+
agent.load_state_dict(torch.load(
57+
f"models/{experiment_name}/agent.pt", map_location=device))
58+
agent.eval()
59+
print(f"resumed at update {starting_update}")
60+
61+
for update in range(starting_update, num_updates + 1):
62+
# ... do rollouts and train models
63+
64+
if args.track:
65+
# make sure to tune `CHECKPOINT_FREQUENCY`
66+
# so models are not saved too frequently
67+
if update % CHECKPOINT_FREQUENCY == 0:
68+
torch.save(agent.state_dict(), f"{wandb.run.dir}/agent.pt")
69+
wandb.save(f"{wandb.run.dir}/agent.pt", policy="now")
70+
```
71+
72+
To resume training, note the ID of the experiment is `21421tda` as in the URL [https://wandb.ai/costa-huang/cleanRL/runs/21421tda](https://wandb.ai/costa-huang/cleanRL/runs/21421tda), so we need to pass in the ID via environment variable to trigger the resume mode of W&B:
73+
74+
```
75+
WANDB_RUN_ID=21421tda WANDB_RESUME=must python ppo_gridnet.py --prod-mode --capture-video
76+
```

docs/cloud/aws_batch1.png

1.7 MB
Loading

docs/cloud/aws_batch2.png

1.78 MB
Loading

docs/cloud/installation.md

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
# Installation
2+
3+
The rough idea behind the cloud integration is to package our code into a docker container and use AWS Batch to
4+
run thousands of experiments concurrently.
5+
6+
7+
We use Terraform to define our infrastructure with AWS Batch, which you can spin up as follows
8+
9+
```bash
10+
# assuming you are at the root of the CleanRL project
11+
poetry install -E cloud
12+
cd cleanrl/cloud
13+
python -m awscli configure
14+
terraform init
15+
export AWS_DEFAULT_REGION=$(aws configure get region --profile default)
16+
terraform apply
17+
```
18+
19+
<script id="asciicast-445048" src="https://asciinema.org/a/445048.js" async></script>
20+
21+
!!! note
22+
Don't worry about the cost of spining up these AWS Batch compute environments and job queues. They are completely free and you are only charged when you submit experiments.
23+
24+
25+
Then your AWS Batch console should look like
26+
27+
![aws_batch1.png](aws_batch1.png)
28+
29+
30+
### Clean Up
31+
Uninstalling/Deleting the infrastructure is pretty straightforward:
32+
```
33+
export AWS_DEFAULT_REGION=$(aws configure get region --profile default)
34+
terraform destroy
35+
```

docs/cloud/submit-experiments.md

Lines changed: 84 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,84 @@
1+
# Submit Experiments
2+
3+
### Inspection
4+
5+
Dry run to inspect the generated docker command
6+
```
7+
poetry run python -m cleanrl_utils.submit_exp \
8+
--docker-tag vwxyzjn/cleanrl:latest \
9+
--command "poetry run python cleanrl/ppo.py --gym-id CartPole-v1 --total-timesteps 100000 --track --capture-video" \
10+
--num-seed 1 \
11+
--num-vcpu 1 \
12+
--num-memory 2000 \
13+
--num-hours 48.0
14+
```
15+
16+
The generated docker command should look like
17+
```
18+
docker run -d --cpuset-cpus="0" -e WANDB_API_KEY=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx vwxyzjn/cleanrl:latest /bin/bash -c "poetry run python cleanrl/ppo.py --gym-id CartPole-v1 --total-timesteps 100000 --track --capture-video --seed 1"
19+
```
20+
21+
### Run on AWS
22+
23+
Submit a job using AWS's compute-optimized spot instances
24+
```
25+
poetry run python -m cleanrl_utils.submit_exp \
26+
--docker-tag vwxyzjn/cleanrl:latest \
27+
--command "poetry run python cleanrl/ppo.py --gym-id CartPole-v1 --total-timesteps 100000 --track --capture-video" \
28+
--job-queue c5a-large-spot \
29+
--num-seed 1 \
30+
--num-vcpu 1 \
31+
--num-memory 2000 \
32+
--num-hours 48.0 \
33+
--provider aws
34+
```
35+
36+
Submit a job using AWS's accelerated-computing spot instances
37+
```
38+
poetry run python -m cleanrl_utils.submit_exp \
39+
--docker-tag vwxyzjn/cleanrl:latest \
40+
--command "poetry run python cleanrl/ppo_atari.py --gym-id BreakoutNoFrameskip-v4 --track --capture-video" \
41+
--job-queue g4dn-xlarge-spot \
42+
--num-seed 1 \
43+
--num-vcpu 1 \
44+
--num-gpu 1 \
45+
--num-memory 4000 \
46+
--num-hours 48.0 \
47+
--provider aws
48+
```
49+
50+
Submit a job using AWS's compute-optimized on-demand instances
51+
```
52+
poetry run python -m cleanrl_utils.submit_exp \
53+
--docker-tag vwxyzjn/cleanrl:latest \
54+
--command "poetry run python cleanrl/ppo.py --gym-id CartPole-v1 --total-timesteps 100000 --track --capture-video" \
55+
--job-queue c5a-large \
56+
--num-seed 1 \
57+
--num-vcpu 1 \
58+
--num-memory 2000 \
59+
--num-hours 48.0 \
60+
--provider aws
61+
```
62+
63+
Submit a job using AWS's accelerated-computing on-demand instances
64+
```
65+
poetry run python -m cleanrl_utils.submit_exp \
66+
--docker-tag vwxyzjn/cleanrl:latest \
67+
--command "poetry run python cleanrl/ppo_atari.py --gym-id BreakoutNoFrameskip-v4 --track --capture-video" \
68+
--job-queue g4dn-xlarge \
69+
--num-seed 1 \
70+
--num-vcpu 1 \
71+
--num-gpu 1 \
72+
--num-memory 4000 \
73+
--num-hours 48.0 \
74+
--provider aws
75+
```
76+
77+
<script id="asciicast-445050" src="https://asciinema.org/a/445050.js" async></script>
78+
79+
Then you should see:
80+
81+
![aws_batch1.png](aws_batch1.png)
82+
![aws_batch2.png](aws_batch2.png)
83+
84+
![wandb.png](wandb.png)

docs/cloud/wandb.png

810 KB
Loading

docs/community.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
We have a [Discord Community](https://discord.gg/D6RCjA6sVT) for support. Feel free to ask questions. Posting in [Github Issues](https://github.com/vwxyzjn/cleanrl/issues) and PRs are also welcome. Also our past video recordings are available at [YouTube](https://www.youtube.com/watch?v=dm4HdGujpPs&list=PLQpKd36nzSuMynZLU2soIpNSMeXMplnKP&index=2)
2+
3+
## Related Resources
4+
5+
- [Deep Reinforcement Learning With TensorFlow 2.1](http://inoryy.com/post/tensorflow2-deep-reinforcement-learning/)
6+
- [minimalRL - PyTorch](https://github.com/seungeunrho/minimalRL)
7+
- [Deep-Reinforcement-Learning-Hands-On](https://github.com/Shmuma/Deep-Reinforcement-Learning-Hands-On)
8+
- [Stable Baselines3](https://github.com/DLR-RM/stable-baselines3)
9+
- [PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).](https://github.com/ikostrikov/pytorch-a2c-ppo-acktr-gail)
10+
- [Reinforcement-Implementation](https://github.com/zhangchuheng123/Reinforcement-Implementation/blob/master/code/ppo.py)
11+

docs/contribution.md

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
Thank you for being interested in contributing to our project. All kinds of contributions are welcome.
2+
3+
Below are some steps to help you get started:
4+
5+
1. Join our [discord channel](https://discord.gg/D6RCjA6sVT)
6+
to say hi!! 👋
7+
2. Pick something you want to work at and let us know on slack. You could
8+
* Tackle issues with the [`help wanted`](https://github.com/vwxyzjn/cleanrl/issues?q=is%3Aissue+is%3Aopen+label%3A%22help+wanted%22) flag
9+
* Bug fixes and various improvements on existing algorithms
10+
* **Contribute to the Open RL Benchmark**
11+
* You could add new algorithms or new games to be featured in the [Open RL Benchmark](http://benchmark.cleanrl.dev/)
12+
* Free free to contact me (Costa) directly on slack. I will add you to the [CleanRL's Team](https://wandb.ai/cleanrl) at Weight and Biases. Your experiments can be featured on the [Open RL Benchmark](http://benchmark.cleanrl.dev/).
13+
3. Submit a PR and get it merged! 🎇
14+
15+
Good luck and have fun!

0 commit comments

Comments
 (0)