DLR-RM · araffin · Oct 31, 2020 · Oct 31, 2020 · Nov 29, 2020 · Nov 29, 2020
diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
@@ -17,7 +17,6 @@ jobs:
     strategy:
       matrix:
         python-version: [3.7, 3.8, 3.9]
-
     steps:
     - uses: actions/checkout@v2
       with:
@@ -30,13 +29,13 @@ jobs:
       run: |
         python -m pip install --upgrade pip
         # cpu version of pytorch - faster to download
-        pip install torch==1.8.1+cpu -f https://download.pytorch.org/whl/torch_stable.html
+        pip install torch==1.11+cpu -f https://download.pytorch.org/whl/torch_stable.html
         pip install pybullet==3.1.9
         pip install -r requirements.txt
         # Use headless version
         pip install opencv-python-headless
         # install parking-env to test HER (pinned so it works with gym 0.21)
-        pip install git+https://github.com/eleurent/highway-env@1a04c6a98be64632cf9683625022023e70ff1ab1
+        pip install highway-env==1.5.0
     - name: Type check
       run: |
         make type

diff --git a/.github/workflows/trained_agents.yml b/.github/workflows/trained_agents.yml
@@ -29,13 +29,13 @@ jobs:
       run: |
         python -m pip install --upgrade pip
         # cpu version of pytorch - faster to download
-        pip install torch==1.8.1+cpu -f https://download.pytorch.org/whl/torch_stable.html
+        pip install torch==1.11+cpu -f https://download.pytorch.org/whl/torch_stable.html
         pip install pybullet==3.1.9
         pip install -r requirements.txt
         # Use headless version
         pip install opencv-python-headless
         # install parking-env to test HER (pinned so it works with gym 0.21)
-        pip install git+https://github.com/eleurent/highway-env@1a04c6a98be64632cf9683625022023e70ff1ab1
+        pip install highway-env==1.5.0
         # Add support for pickle5 protocol
         pip install pickle5
     - name: Check trained agents

diff --git a/.gitignore b/.gitignore
@@ -15,3 +15,7 @@ git_rewrite_commit_history.sh
 .vscode/
 wandb
 runs
+hub
+*.mp4
+*.json
+*.csv
diff --git a/.gitlab-ci.yml b/.gitlab-ci.yml
@@ -6,17 +6,25 @@ variables:
 
 type-check:
   script:
+  - pip install git+https://github.com/huggingface/huggingface_sb3
   - make type
 
 pytest:
   script:
   # MKL_THREADING_LAYER=GNU to avoid MKL_THREADING_LAYER=INTEL incompatibility error
+  # tmp fix to have RecurrentPPO, will be fixed with new image
+  - pip install git+https://github.com/Stable-Baselines-Team/stable-baselines3-contrib
+  - pip install git+https://github.com/huggingface/huggingface_sb3
   - MKL_THREADING_LAYER=GNU make pytest
+  coverage: '/^TOTAL.+?(\d+\%)$/'
 
 check-trained-agents:
   script:
   # MKL_THREADING_LAYER=GNU to avoid MKL_THREADING_LAYER=INTEL incompatibility error
   - pip install pickle5  # Add support for pickle5 protocol
+  - pip install git+https://github.com/huggingface/huggingface_sb3
+  # tmp fix to have RecurrentPPO, will be fixed with new image
+  - pip install git+https://github.com/Stable-Baselines-Team/stable-baselines3-contrib
   - MKL_THREADING_LAYER=GNU make check-trained-agents
 
 lint:

diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,18 +1,28 @@
-## Release 1.5.1a0 (WIP)
+## Release 1.5.1a8 (WIP)
 
 ### Breaking Changes
 - Change default value for number of hyperparameter optimization trials from 10 to 500. (@ernestum)
 - Derive number of intermediate pruning evaluations from number of time steps (1 evaluation per 100k time steps.) (@ernestum)
-- Updated default --eval-freq from 10k to 25k steps 
+- Updated default --eval-freq from 10k to 25k steps
+- Update default horizon to 2 for the `HistoryWrapper`
 
 ### New Features
 - Support setting PyTorch's device with thye `--device` flag (@gregwar)
+- Add `--max-total-trials` parameter to help with distributed optimization. (@ernestum)
+- Added `vec_env_wrapper` support in the config (works the same as `env_wrapper`)
+- Added Huggingface hub integration
+- Added `RecurrentPPO` support (aka `ppo_lstm`)
+- Added autodownload for "official" sb3 models from the hub
 
 ### Bug fixes
+- Fix `Reacher-v3` name in PPO hyperparameter file
+- Pinned ale-py==0.7.4 until new SB3 version is released
+- Fix enjoy / record videos with LSTM policy
 
 ### Documentation
 
 ### Other
+- When pruner is set to `"none"`, use `NopPruner` instead of diverted `MedianPruner` (@qgallouedec)
 
 ## Release 1.5.0 (2022-03-25)
 

diff --git a/README.md b/README.md
@@ -76,7 +76,7 @@ python scripts/plot_train.py -a her -e Fetch -y success -f rl-trained-agents/ -w
 Plot evaluation reward curve for TQC, SAC and TD3 on the HalfCheetah and Ant PyBullet environments:
 
 ```
-python scripts/all_plots.py -a sac td3 tqc --env HalfCheetah Ant -f rl-trained-agents/
+python3 scripts/all_plots.py -a sac td3 tqc --env HalfCheetahBullet AntBullet -f rl-trained-agents/
 ```
 
 ## Plot with the rliable library
@@ -139,9 +139,46 @@ To load a checkpoint (here the checkpoint name is `rl_model_10000_steps.zip`):
 python enjoy.py --algo algo_name --env env_id -f logs/ --exp-id 1 --load-checkpoint 10000
 ```
 
-To load the latest checkpoint:
+## Offline Training with d3rlpy
+
+```
+python train.py --algo human --env donkey-generated-track-v0 --env-kwargs frame_skip:1 throttle_max:2.0 throttle_min:0.0 steering_min:-0.5 steering_max:0.5 level:6 max_cte:100000 test_mode:False  --num-threads 2 --eval-freq -1 \
+ -b logs/human/donkey-generated-track-v0_1/replay_buffer.pkl \
+ --pretrain-params batch_size:256 n_eval_episodes:1 n_epochs:20 n_iterations:20 \
+ --offline-algo bc
+```
+
+Params:
+
+```python
+
+n_iterations = args.pretrain_params.get("n_iterations", 10)
+n_epochs = args.pretrain_params.get("n_epochs", 1)
+q_func_type = args.pretrain_params.get("q_func_type")
+batch_size = args.pretrain_params.get("batch_size", 512)
+# n_action_samples = args.pretrain_params.get("n_action_samples", 1)
+n_eval_episodes = args.pretrain_params.get("n_eval_episodes", 5)
+add_to_buffer = args.pretrain_params.get("add_to_buffer", False)
+deterministic = args.pretrain_params.get("deterministic", True)
+```
+
+## Human driving
+
+```
+python train.py --algo human --env donkey-generated-track-v0 --env-kwargs frame_skip:1 throttle_max:1.0 throttle_min:-1.0 steering_min:-1 steering_max:1 level:6 max_cte:100000  --num-threads 2 --eval-freq -1
+```
+
+## Huggingface Hub Integration
+
+Upload model to hub (same syntax as for `enjoy.py`):
+```
+python -m utils.push_to_hub --algo ppo --env CartPole-v1 -f logs/ -orga sb3 -m "Initial commit"
 ```
-python enjoy.py --algo algo_name --env env_id -f logs/ --exp-id 1 --load-last-checkpoint
+you can choose custom `repo-name` (default: `{algo}-{env_id}`) by passing a `--repo-name` argument.
+
+Download model from hub:
+```
+python -m utils.load_from_hub --algo ppo --env CartPole-v1 -f logs/ -orga sb3
 ```
 
 ## Hyperparameter yaml syntax
@@ -243,6 +280,17 @@ env_wrapper:
 
 Note that you can easily specify parameters too.
 
+## VecEnvWrapper
+
+You can specify which `VecEnvWrapper` to use in the config, the same way as for env wrappers (see above), using the `vec_env_wrapper` key:
+
+For instance:
+```yaml
+vec_env_wrapper: stable_baselines3.common.vec_env.VecMonitor
+```
+
+Note: `VecNormalize` is supported separately using `normalize` keyword, and `VecFrameStack` has a dedicated keyword `frame_stack`.
+
 ## Callbacks
 
 Following the same syntax as env wrappers, you can also add custom callbacks to use during training.
@@ -311,6 +359,8 @@ python -m utils.record_training --algo ppo --env CartPole-v1 -n 1000 -f logs --d
 
 Final performance of the trained agents can be found in [`benchmark.md`](./benchmark.md). To compute them, simply run `python -m utils.benchmark`.
 
+List and videos of trained agents can be found on our Huggingface page: https://huggingface.co/sb3
+
 *NOTE: this is not a quantitative benchmark as it corresponds to only one run (cf [issue #38](https://github.com/araffin/rl-baselines-zoo/issues/38)). This benchmark is meant to check algorithm (maximal) performance, find potential bugs and also allow users to have access to pretrained agents.*
 
 ### Atari Games
-Original file line number
+Diff line change
@@ Expand Up / @@ -15,3 +15,7 @@ git_rewrite_commit_history.sh @@
     .vscode/
     wandb
     runs
+    hub
+    *.mp4
+    *.json
+    *.csv