This repo is based on the original paper Kurutach, Thanard, et al. "Model-Ensemble Trust-Region Policy Optimization." arXiv preprint arXiv:1802.10592 (2018).link.
We modified the repo to perform benchmarking as part of the Model Based Reinforcement Learning Benchmarking Library (MBBL). Please refer to the project page for more information.
We also recommend reading of this repo, which is the repo shared by the authors of METRPO
You need a MuJoCo license, and download MuJoCo 1.31. from https://www.roboti.us/. Useful information for installing MuJoCo can be found at https://github.com/openai/mujoco-py.
It's recommended to create a new Conda environment for this repo:
conda create -n <env_name> python=3.5
Or you can use python 3.6.
pip install -r requirements.txt
Then please go to MBBL to install the mbbl package for the environments.
To run the benchmarking environments, please refer to ./metrpo_gym_search_new.sh
.
Run experiments using the following command:
python main.py --env <env_name> --exp_name <experiment_name> --sub_exp_name <exp_save_dir>
env_name
: one of(half_cheetah, ant, hopper, swimmer)
exp_name
: what you want to call your experimentsub_exp_name
: partial path for saving experiment logs and results
Experiment results will be logged to ./experiments/<exp_save_dir>/<experiment_name>
e.g. python main.py --env half_cheetah --exp_name test-exp --sub_exp_name test-exp-dir
You can modify the configuration parameters in configs/params_<env_name>.json
.