Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat/prepare twitch part3 #239

Draft
wants to merge 105 commits into
base: master
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
105 commits
Select commit Hold shift + click to select a range
182e3d4
Add Offline-RL support
araffin Oct 31, 2020
a165b73
Deactivate logging for offline RL
araffin Oct 31, 2020
3ab448e
Merge branch 'master' into feat/offline-RL
araffin Nov 29, 2020
49f1c4e
Update d3rlpy wrapper
araffin Nov 29, 2020
c8cff9f
Merge branch 'master' into feat/offline-RL
araffin Feb 7, 2021
569d4b4
Update hyperparams
araffin Feb 14, 2021
b926f9d
Update buffer size
araffin Feb 14, 2021
fee7909
Add support for residual RL
araffin Feb 14, 2021
cd5e4b4
Update README
araffin Feb 15, 2021
a1d6094
Merge branch 'master' into feat/offline-RL
araffin Feb 16, 2021
2e8a072
test with ppo
araffin Feb 22, 2021
4fc0864
Update d3rlpy
araffin Feb 23, 2021
d335335
Fix for CRR
araffin Feb 23, 2021
49909ef
Fix pushed to d3rlpy
araffin Feb 24, 2021
fefc3bd
Handle keyboard interrupt
araffin Feb 26, 2021
a2dd644
Merge branch 'master' into feat/offline-RL
araffin Feb 28, 2021
03e682d
Upgrade SB3 + add d3rlpy as requirement
araffin Feb 28, 2021
87b65aa
Add CMAES
araffin Feb 28, 2021
0f7fd0d
Fixes
araffin Feb 28, 2021
d24842e
Copy to avoid modification by reference
araffin Feb 28, 2021
a9a45f4
Merge branch 'master' into feat/offline-RL
araffin Mar 1, 2021
c1dea20
Start support for d3rlpy
araffin Mar 1, 2021
df80a26
Add net_arch for BC
araffin Mar 1, 2021
a4d1c8b
Bug fixes
araffin Mar 1, 2021
8c98c3c
Use best candidate only once
araffin Mar 1, 2021
c963be2
Remove HER
araffin Apr 5, 2021
40ce4c5
Add basic support for refactored HER
araffin Apr 5, 2021
7fad104
Add TQC
araffin Apr 7, 2021
da56dff
Merge branch 'master' into feat/offline-RL
araffin Apr 14, 2021
7105a55
Merge branch 'master' into feat/offline-RL
araffin Apr 19, 2021
f6b9b43
Integrate RL Racing
araffin Apr 20, 2021
03af926
Use TQC
araffin Apr 26, 2021
b4d26c9
Add space engineer env
araffin Apr 27, 2021
76b40d6
Merge branch 'master' into refactor/her
araffin Apr 29, 2021
e7e9ee6
Update hyperparams
araffin Apr 29, 2021
ae706eb
Fix hyperparam
araffin Apr 29, 2021
d079811
Merge branch 'master' into refactor/her
araffin May 3, 2021
3dad1ed
Merge branch 'master' into feat/offline-RL
araffin May 3, 2021
73bea6e
Removed unused callback
araffin May 4, 2021
7e96e10
Update CI
araffin May 4, 2021
7c4f1bc
Add partial support for parallel training
araffin May 4, 2021
cfa30b5
Merge branch 'feat/parallel-train' into feat/offline-RL
araffin May 4, 2021
5f7980f
Try parallel training
araffin May 5, 2021
b635157
Cleanup
araffin May 5, 2021
55ad418
Merge branch 'feat/parallel-train' into feat/offline-RL
araffin May 5, 2021
76ea2ea
Add notes
araffin May 5, 2021
2966689
Avoid modify by reference + add sleep time
araffin May 5, 2021
5f9e044
Merge branch 'feat/parallel-train' into feat/offline-RL
araffin May 5, 2021
aa6d934
Take learning starts into account
araffin May 6, 2021
c2baee5
Merge branch 'feat/parallel-train' into feat/offline-RL
araffin May 6, 2021
76ecb24
Merge branch 'master' into misc/veridream
araffin May 10, 2021
54d5596
Merge branch 'feat/parallel-train' into misc/veridream
araffin May 10, 2021
563d853
Update hyperparams
araffin May 11, 2021
779c4c6
Add dict obs support
araffin May 11, 2021
43188ad
Update test env
araffin May 12, 2021
20c5dd8
Merge branch 'master' into refactor/her
araffin May 12, 2021
b461048
Version bump
araffin May 12, 2021
15c8fa1
Merge branch 'refactor/her' into misc/veridream
araffin May 12, 2021
4857bca
Merge branch 'master' into misc/veridream
araffin May 12, 2021
f73a65e
Update for symmetric control + catch zmq error
araffin May 14, 2021
da441b8
Save best model
araffin May 15, 2021
bf05e92
Fix parallel save (maybe issue with optimizer)
araffin May 16, 2021
a9d63db
Update hyperparams
araffin May 16, 2021
d395120
Update best params
araffin May 17, 2021
5634128
Update hyperparams
araffin May 19, 2021
c5b7f55
Prepare big network experiment
araffin May 19, 2021
976833d
Revert to normal net
araffin May 19, 2021
c6b2dce
Add exception for windows
araffin May 19, 2021
6703390
Update plot script: allow multiple envs
araffin May 19, 2021
a771249
Add bert params
araffin May 21, 2021
c966371
Save multirobot hyperparams
araffin May 26, 2021
f6f3dd0
Merge branch 'master' into misc/veridream
araffin May 27, 2021
7f24273
Merge branch 'master' into feat/offline-RL
araffin May 29, 2021
7dc1b3a
Remove CMAES
araffin May 29, 2021
1f75be3
Merge branch 'misc/veridream' into feat/offline-RL
araffin May 29, 2021
61925cb
Update params for newer machine
araffin May 29, 2021
fe58ab0
Re-add timelimit
araffin May 29, 2021
9b71984
Re-add monitor
araffin May 29, 2021
9065a2a
Add POC for VecEnvWrapper
araffin May 30, 2021
6cc8256
Update params
araffin May 30, 2021
fedbde7
Merge branch 'master' into feat/offline-RL
araffin Jun 21, 2021
80f4df1
Merge branch 'master' into feat/offline-RL
araffin Jun 29, 2021
a8d25b3
Fixes
araffin Jun 29, 2021
5530171
Format
araffin Jun 29, 2021
a2244f0
Fix loading
araffin Jun 29, 2021
ab98711
Cleanup hyperparams
araffin Jan 4, 2022
57c0d50
Merge branch 'master' into feat/offline-RL
araffin Jan 4, 2022
df89b55
Sync vec normalize for parallel training
araffin Jan 7, 2022
da26cb9
Update params
araffin Jan 7, 2022
3655b90
Add episode length plot
araffin Jan 8, 2022
24d3f89
Add hack when VecNormalize was not saved
araffin Jan 10, 2022
053f3b4
Merge branch 'feat/offline-RL' of github.com:DLR-RM/rl-baselines3-zoo…
araffin Jan 10, 2022
a5832e4
Add websocket wait
araffin Jan 12, 2022
54cc4d0
Fix bool conversion
araffin Jan 15, 2022
a4e37d1
Merge branch 'master' into feat/offline-RL
araffin Jan 19, 2022
3e2761e
Save edits from live
araffin Apr 5, 2022
988bf7b
Merge branch 'master' into feat/live-twitch
araffin Apr 5, 2022
6497088
Save new callback and wrapper
araffin Apr 9, 2022
6a2dc31
Disable continuity wrapper and add expln trick
araffin Apr 11, 2022
75c4054
Merge remote-tracking branch 'origin/master' into feat/prepare-twitch…
araffin Apr 12, 2022
5cb5e1d
Update default
araffin Apr 17, 2022
987e34d
Save edits from live
araffin Apr 20, 2022
c7f218f
Flush only when new lap is detected
araffin Apr 21, 2022
d415090
Update sac params
araffin Apr 21, 2022
de982ba
Merge branch 'master' into feat/prepare-twitch-part3
araffin Jun 3, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 2 additions & 3 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,6 @@ jobs:
strategy:
matrix:
python-version: [3.7, 3.8, 3.9]

steps:
- uses: actions/checkout@v2
with:
Expand All @@ -30,13 +29,13 @@ jobs:
run: |
python -m pip install --upgrade pip
# cpu version of pytorch - faster to download
pip install torch==1.8.1+cpu -f https://download.pytorch.org/whl/torch_stable.html
pip install torch==1.11+cpu -f https://download.pytorch.org/whl/torch_stable.html
pip install pybullet==3.1.9
pip install -r requirements.txt
# Use headless version
pip install opencv-python-headless
# install parking-env to test HER (pinned so it works with gym 0.21)
pip install git+https://github.com/eleurent/highway-env@1a04c6a98be64632cf9683625022023e70ff1ab1
pip install highway-env==1.5.0
- name: Type check
run: |
make type
Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/trained_agents.yml
Original file line number Diff line number Diff line change
Expand Up @@ -29,13 +29,13 @@ jobs:
run: |
python -m pip install --upgrade pip
# cpu version of pytorch - faster to download
pip install torch==1.8.1+cpu -f https://download.pytorch.org/whl/torch_stable.html
pip install torch==1.11+cpu -f https://download.pytorch.org/whl/torch_stable.html
pip install pybullet==3.1.9
pip install -r requirements.txt
# Use headless version
pip install opencv-python-headless
# install parking-env to test HER (pinned so it works with gym 0.21)
pip install git+https://github.com/eleurent/highway-env@1a04c6a98be64632cf9683625022023e70ff1ab1
pip install highway-env==1.5.0
# Add support for pickle5 protocol
pip install pickle5
- name: Check trained agents
Expand Down
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -15,3 +15,7 @@ git_rewrite_commit_history.sh
.vscode/
wandb
runs
hub
*.mp4
*.json
*.csv
8 changes: 8 additions & 0 deletions .gitlab-ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,17 +6,25 @@ variables:

type-check:
script:
- pip install git+https://github.com/huggingface/huggingface_sb3
- make type

pytest:
script:
# MKL_THREADING_LAYER=GNU to avoid MKL_THREADING_LAYER=INTEL incompatibility error
# tmp fix to have RecurrentPPO, will be fixed with new image
- pip install git+https://github.com/Stable-Baselines-Team/stable-baselines3-contrib
- pip install git+https://github.com/huggingface/huggingface_sb3
- MKL_THREADING_LAYER=GNU make pytest
coverage: '/^TOTAL.+?(\d+\%)$/'

check-trained-agents:
script:
# MKL_THREADING_LAYER=GNU to avoid MKL_THREADING_LAYER=INTEL incompatibility error
- pip install pickle5 # Add support for pickle5 protocol
- pip install git+https://github.com/huggingface/huggingface_sb3
# tmp fix to have RecurrentPPO, will be fixed with new image
- pip install git+https://github.com/Stable-Baselines-Team/stable-baselines3-contrib
- MKL_THREADING_LAYER=GNU make check-trained-agents

lint:
Expand Down
14 changes: 12 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,18 +1,28 @@
## Release 1.5.1a0 (WIP)
## Release 1.5.1a8 (WIP)

### Breaking Changes
- Change default value for number of hyperparameter optimization trials from 10 to 500. (@ernestum)
- Derive number of intermediate pruning evaluations from number of time steps (1 evaluation per 100k time steps.) (@ernestum)
- Updated default --eval-freq from 10k to 25k steps
- Updated default --eval-freq from 10k to 25k steps
- Update default horizon to 2 for the `HistoryWrapper`

### New Features
- Support setting PyTorch's device with thye `--device` flag (@gregwar)
- Add `--max-total-trials` parameter to help with distributed optimization. (@ernestum)
- Added `vec_env_wrapper` support in the config (works the same as `env_wrapper`)
- Added Huggingface hub integration
- Added `RecurrentPPO` support (aka `ppo_lstm`)
- Added autodownload for "official" sb3 models from the hub

### Bug fixes
- Fix `Reacher-v3` name in PPO hyperparameter file
- Pinned ale-py==0.7.4 until new SB3 version is released
- Fix enjoy / record videos with LSTM policy

### Documentation

### Other
- When pruner is set to `"none"`, use `NopPruner` instead of diverted `MedianPruner` (@qgallouedec)

## Release 1.5.0 (2022-03-25)

Expand Down
56 changes: 53 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,7 @@ python scripts/plot_train.py -a her -e Fetch -y success -f rl-trained-agents/ -w
Plot evaluation reward curve for TQC, SAC and TD3 on the HalfCheetah and Ant PyBullet environments:

```
python scripts/all_plots.py -a sac td3 tqc --env HalfCheetah Ant -f rl-trained-agents/
python3 scripts/all_plots.py -a sac td3 tqc --env HalfCheetahBullet AntBullet -f rl-trained-agents/
```

## Plot with the rliable library
Expand Down Expand Up @@ -139,9 +139,46 @@ To load a checkpoint (here the checkpoint name is `rl_model_10000_steps.zip`):
python enjoy.py --algo algo_name --env env_id -f logs/ --exp-id 1 --load-checkpoint 10000
```

To load the latest checkpoint:
## Offline Training with d3rlpy

```
python train.py --algo human --env donkey-generated-track-v0 --env-kwargs frame_skip:1 throttle_max:2.0 throttle_min:0.0 steering_min:-0.5 steering_max:0.5 level:6 max_cte:100000 test_mode:False --num-threads 2 --eval-freq -1 \
-b logs/human/donkey-generated-track-v0_1/replay_buffer.pkl \
--pretrain-params batch_size:256 n_eval_episodes:1 n_epochs:20 n_iterations:20 \
--offline-algo bc
```

Params:

```python

n_iterations = args.pretrain_params.get("n_iterations", 10)
n_epochs = args.pretrain_params.get("n_epochs", 1)
q_func_type = args.pretrain_params.get("q_func_type")
batch_size = args.pretrain_params.get("batch_size", 512)
# n_action_samples = args.pretrain_params.get("n_action_samples", 1)
n_eval_episodes = args.pretrain_params.get("n_eval_episodes", 5)
add_to_buffer = args.pretrain_params.get("add_to_buffer", False)
deterministic = args.pretrain_params.get("deterministic", True)
```

## Human driving

```
python train.py --algo human --env donkey-generated-track-v0 --env-kwargs frame_skip:1 throttle_max:1.0 throttle_min:-1.0 steering_min:-1 steering_max:1 level:6 max_cte:100000 --num-threads 2 --eval-freq -1
```

## Huggingface Hub Integration

Upload model to hub (same syntax as for `enjoy.py`):
```
python -m utils.push_to_hub --algo ppo --env CartPole-v1 -f logs/ -orga sb3 -m "Initial commit"
```
python enjoy.py --algo algo_name --env env_id -f logs/ --exp-id 1 --load-last-checkpoint
you can choose custom `repo-name` (default: `{algo}-{env_id}`) by passing a `--repo-name` argument.

Download model from hub:
```
python -m utils.load_from_hub --algo ppo --env CartPole-v1 -f logs/ -orga sb3
```

## Hyperparameter yaml syntax
Expand Down Expand Up @@ -243,6 +280,17 @@ env_wrapper:

Note that you can easily specify parameters too.

## VecEnvWrapper

You can specify which `VecEnvWrapper` to use in the config, the same way as for env wrappers (see above), using the `vec_env_wrapper` key:

For instance:
```yaml
vec_env_wrapper: stable_baselines3.common.vec_env.VecMonitor
```

Note: `VecNormalize` is supported separately using `normalize` keyword, and `VecFrameStack` has a dedicated keyword `frame_stack`.

## Callbacks

Following the same syntax as env wrappers, you can also add custom callbacks to use during training.
Expand Down Expand Up @@ -311,6 +359,8 @@ python -m utils.record_training --algo ppo --env CartPole-v1 -n 1000 -f logs --d

Final performance of the trained agents can be found in [`benchmark.md`](./benchmark.md). To compute them, simply run `python -m utils.benchmark`.

List and videos of trained agents can be found on our Huggingface page: https://huggingface.co/sb3

*NOTE: this is not a quantitative benchmark as it corresponds to only one run (cf [issue #38](https://github.com/araffin/rl-baselines-zoo/issues/38)). This benchmark is meant to check algorithm (maximal) performance, find potential bugs and also allow users to have access to pretrained agents.*

### Atari Games
Expand Down
Loading