dipamc
diff --git a/‎.github/workflows/docs.yaml
Lines changed: 17 additions & 0 deletions b/‎.github/workflows/docs.yaml
Lines changed: 17 additions & 0 deletions
diff --git a/‎docs/CNAME
Lines changed: 1 addition & 0 deletions b/‎docs/CNAME
Lines changed: 1 addition & 0 deletions
diff --git a/‎docs/advanced/resume-training.md
Lines changed: 76 additions & 0 deletions b/‎docs/advanced/resume-training.md
Lines changed: 76 additions & 0 deletions
diff --git a/‎docs/cloud/aws_batch1.png
1.7 MB b/‎docs/cloud/aws_batch1.png
1.7 MB
diff --git a/‎docs/cloud/aws_batch2.png
1.78 MB b/‎docs/cloud/aws_batch2.png
1.78 MB
diff --git a/‎docs/cloud/installation.md
Lines changed: 35 additions & 0 deletions b/‎docs/cloud/installation.md
Lines changed: 35 additions & 0 deletions
diff --git a/‎docs/cloud/submit-experiments.md
Lines changed: 84 additions & 0 deletions b/‎docs/cloud/submit-experiments.md
Lines changed: 84 additions & 0 deletions
diff --git a/‎docs/cloud/wandb.png
810 KB b/‎docs/cloud/wandb.png
810 KB
diff --git a/‎docs/community.md
Lines changed: 11 additions & 0 deletions b/‎docs/community.md
Lines changed: 11 additions & 0 deletions
diff --git a/‎docs/contribution.md
Lines changed: 15 additions & 0 deletions b/‎docs/contribution.md
Lines changed: 15 additions & 0 deletions
@@ -0,0 +1,17 @@
+name: ci 
+on:
+  push:
+    branches: 
+      - master
+      - main
+      - documentation-site
+jobs:
+  deploy:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v2
+      - uses: actions/setup-python@v2
+        with:
+          python-version: 3.x
+      - run: pip install mkdocs-material
+      - run: mkdocs gh-deploy --force
@@ -0,0 +1 @@
+docs.cleanrl.dev
@@ -0,0 +1,76 @@
+# Resume Training
+
+
+A common question we get asked is how to set up model checkpoints to continue training. In this document, we take this [PPO example](https://github.com/vwxyzjn/gym-microrts/blob/master/experiments/ppo_gridnet.py) to explain that question.
+
+## Save model checkpoints
+
+The first step is to save models periodically. By default, we save the model to `wandb`.
+
+```python linenums="1" hl_lines="3 4 6 9-14"
+num_updates = args.total_timesteps // args.batch_size
+
+CHECKPOINT_FREQUENCY = 50
+starting_update = 1
+
+for update in range(starting_update, num_updates + 1):
+    # ... do rollouts and train models
+
+    if args.track:
+        # make sure to tune `CHECKPOINT_FREQUENCY` 
+        # so models are not saved too frequently
+        if update % CHECKPOINT_FREQUENCY == 0:
+            torch.save(agent.state_dict(), f"{wandb.run.dir}/agent.pt")
+            wandb.save(f"{wandb.run.dir}/agent.pt", policy="now")
+```
+
+Then we could run the following to train our agents
+
+```
+python ppo_gridnet.py --prod-mode --capture-video
+```
+
+If the training was terminated early, we can still see the last updated model `agent.pt` in W&B like in this URL [https://wandb.ai/costa-huang/cleanRL/runs/21421tda/files](https://wandb.ai/costa-huang/cleanRL/runs/21421tda/files) or as follows
+
+<iframe src="https://wandb.ai/costa-huang/cleanRL/runs/21421tda/files" style="width:100%; height:500px" title="CleanRL CartPole-v1 Example"></iframe>
+
+
+## Resume training
+
+The second step is to automatically download the `agent.pt` from the URL above and resume training as follows:
+
+
+```python linenums="1" hl_lines="6-16"
+num_updates = args.total_timesteps // args.batch_size
+
+CHECKPOINT_FREQUENCY = 50
+starting_update = 1
+
+if args.track and wandb.run.resumed:
+    starting_update = run.summary.get("charts/update") + 1
+    global_step = starting_update * args.batch_size
+    api = wandb.Api()
+    run = api.run(f"{run.entity}/{run.project}/{run.id}")
+    model = run.file("agent.pt")
+    model.download(f"models/{experiment_name}/")
+    agent.load_state_dict(torch.load(
+        f"models/{experiment_name}/agent.pt", map_location=device))
+    agent.eval()
+    print(f"resumed at update {starting_update}")
+
+for update in range(starting_update, num_updates + 1):
+    # ... do rollouts and train models
+
+    if args.track:
+        # make sure to tune `CHECKPOINT_FREQUENCY` 
+        # so models are not saved too frequently
+        if update % CHECKPOINT_FREQUENCY == 0:
+            torch.save(agent.state_dict(), f"{wandb.run.dir}/agent.pt")
+            wandb.save(f"{wandb.run.dir}/agent.pt", policy="now")
+```
+
+To resume training, note the ID of the experiment is `21421tda` as in the URL [https://wandb.ai/costa-huang/cleanRL/runs/21421tda](https://wandb.ai/costa-huang/cleanRL/runs/21421tda), so we need to pass in the ID via environment variable to trigger the resume mode of W&B:
+
+```
+WANDB_RUN_ID=21421tda WANDB_RESUME=must python ppo_gridnet.py --prod-mode --capture-video
+``` 
@@ -0,0 +1,35 @@
+# Installation
+
+The rough idea behind the cloud integration is to package our code into a docker container and use AWS Batch to
+run thousands of experiments concurrently. 
+
+
+We use Terraform to define our infrastructure with AWS Batch, which you can spin up as follows
+
+```bash
+# assuming you are at the root of the CleanRL project
+poetry install -E cloud
+cd cleanrl/cloud
+python -m awscli configure
+terraform init
+export AWS_DEFAULT_REGION=$(aws configure get region --profile default)
+terraform apply
+```
+
+<script id="asciicast-445048" src="https://asciinema.org/a/445048.js" async></script>
+
+!!! note
+    Don't worry about the cost of spining up these AWS Batch compute environments and job queues. They are completely free and you are only charged when you submit experiments.
+
+
+Then your AWS Batch console should look like
+
+![aws_batch1.png](aws_batch1.png)
+
+
+### Clean Up
+Uninstalling/Deleting the infrastructure is pretty straightforward:
+```
+export AWS_DEFAULT_REGION=$(aws configure get region --profile default)
+terraform destroy
+```
@@ -0,0 +1,84 @@
+# Submit Experiments
+
+### Inspection
+
+Dry run to inspect the generated docker command
+```
+poetry run python -m cleanrl_utils.submit_exp \
+    --docker-tag vwxyzjn/cleanrl:latest \
+    --command "poetry run python cleanrl/ppo.py --gym-id CartPole-v1 --total-timesteps 100000 --track --capture-video" \
+    --num-seed 1 \
+    --num-vcpu 1 \
+    --num-memory 2000 \
+    --num-hours 48.0
+```
+
+The generated docker command should look like
+```
+docker run -d --cpuset-cpus="0" -e WANDB_API_KEY=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx vwxyzjn/cleanrl:latest /bin/bash -c "poetry run python cleanrl/ppo.py --gym-id CartPole-v1 --total-timesteps 100000 --track --capture-video --seed 1"
+```
+
+### Run on AWS
+
+Submit a job using AWS's compute-optimized spot instances 
+```
+poetry run python -m cleanrl_utils.submit_exp \
+    --docker-tag vwxyzjn/cleanrl:latest \
+    --command "poetry run python cleanrl/ppo.py --gym-id CartPole-v1 --total-timesteps 100000 --track --capture-video" \
+    --job-queue c5a-large-spot \
+    --num-seed 1 \
+    --num-vcpu 1 \
+    --num-memory 2000 \
+    --num-hours 48.0 \
+    --provider aws
+```
+
+Submit a job using AWS's accelerated-computing spot instances 
+```
+poetry run python -m cleanrl_utils.submit_exp \
+    --docker-tag vwxyzjn/cleanrl:latest \
+    --command "poetry run python cleanrl/ppo_atari.py --gym-id BreakoutNoFrameskip-v4 --track --capture-video" \
+    --job-queue g4dn-xlarge-spot \
+    --num-seed 1 \
+    --num-vcpu 1 \
+    --num-gpu 1 \
+    --num-memory 4000 \
+    --num-hours 48.0 \
+    --provider aws
+```
+
+Submit a job using AWS's compute-optimized on-demand instances 
+```
+poetry run python -m cleanrl_utils.submit_exp \
+    --docker-tag vwxyzjn/cleanrl:latest \
+    --command "poetry run python cleanrl/ppo.py --gym-id CartPole-v1 --total-timesteps 100000 --track --capture-video" \
+    --job-queue c5a-large \
+    --num-seed 1 \
+    --num-vcpu 1 \
+    --num-memory 2000 \
+    --num-hours 48.0 \
+    --provider aws
+```
+
+Submit a job using AWS's accelerated-computing on-demand instances 
+```
+poetry run python -m cleanrl_utils.submit_exp \
+    --docker-tag vwxyzjn/cleanrl:latest \
+    --command "poetry run python cleanrl/ppo_atari.py --gym-id BreakoutNoFrameskip-v4 --track --capture-video" \
+    --job-queue g4dn-xlarge \
+    --num-seed 1 \
+    --num-vcpu 1 \
+    --num-gpu 1 \
+    --num-memory 4000 \
+    --num-hours 48.0 \
+    --provider aws
+```
+
+<script id="asciicast-445050" src="https://asciinema.org/a/445050.js" async></script>
+
+Then you should see:
+
+![aws_batch1.png](aws_batch1.png)
+![aws_batch2.png](aws_batch2.png)
+
+![wandb.png](wandb.png)
@@ -0,0 +1,11 @@
+We have a [Discord Community](https://discord.gg/D6RCjA6sVT) for support. Feel free to ask questions. Posting in [Github Issues](https://github.com/vwxyzjn/cleanrl/issues) and PRs are also welcome. Also our past video recordings are available at [YouTube](https://www.youtube.com/watch?v=dm4HdGujpPs&list=PLQpKd36nzSuMynZLU2soIpNSMeXMplnKP&index=2)
+
+## Related Resources
+
+- [Deep Reinforcement Learning With TensorFlow 2.1](http://inoryy.com/post/tensorflow2-deep-reinforcement-learning/)
+- [minimalRL - PyTorch](https://github.com/seungeunrho/minimalRL)
+- [Deep-Reinforcement-Learning-Hands-On](https://github.com/Shmuma/Deep-Reinforcement-Learning-Hands-On)
+- [Stable Baselines3](https://github.com/DLR-RM/stable-baselines3)
+- [PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).](https://github.com/ikostrikov/pytorch-a2c-ppo-acktr-gail)
+- [Reinforcement-Implementation](https://github.com/zhangchuheng123/Reinforcement-Implementation/blob/master/code/ppo.py)
+
@@ -0,0 +1,15 @@
+Thank you for being interested in contributing to our project. All kinds of contributions are welcome. 
+
+Below are some steps to help you get started:
+
+1. Join our [discord channel](https://discord.gg/D6RCjA6sVT)
+to say hi!! 👋
+2. Pick something you want to work at and let us know on slack. You could
+    * Tackle issues with the [`help wanted`](https://github.com/vwxyzjn/cleanrl/issues?q=is%3Aissue+is%3Aopen+label%3A%22help+wanted%22) flag 
+    * Bug fixes and various improvements on existing algorithms
+    * **Contribute to the Open RL Benchmark**
+        * You could add new algorithms or new games to be featured in the [Open RL Benchmark](http://benchmark.cleanrl.dev/)
+        * Free free to contact me (Costa) directly on slack. I will add you to the [CleanRL's Team](https://wandb.ai/cleanrl) at Weight and Biases. Your experiments can be featured on the [Open RL Benchmark](http://benchmark.cleanrl.dev/).
+3. Submit a PR and get it merged! 🎇 
+
+Good luck and have fun!