forked from utiasDSL/safe-control-gym
-
Notifications
You must be signed in to change notification settings - Fork 4
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Hyperparameter Optimization Module (utiasDSL#151)
* 1. bug fixed. 2. kernel extension. 3. batch GP implementatoin. * update dependencies * explicitliy import scipy.linalg * add cartpole configs for gpmpc * add hyperparameter optimization module * catch all the exception in hpo for debugging purpose. * put cartpole configs for gpmpc under the folder of gpmpc * add hpo scripts * 1. include pandas 2. change rel import in gpmpc_experiment.py 3. remove unnecessary config in cartpole_stab.yaml 2. add hpo module in test_build.py * rename config to match default algo name. * remove old configs * add tests * edit bash file with correct arg name * add another host in gpmpc_hpo.sh * change to new dir in gpmpc_hpo.sh * 1. fix a small bug 2. add test_train_gpmpc_cartpole * add a hpo parallelism test * saving before runing hpo * I think the bug is that it reaches thee goal in the first step. * 1. PPO configs. 2. Make cartpole init states harder. 3. First version of JSRL on PPO. * Re-organize a bit (file name, remove __init__.py in test folders). * 1. HPO strategies. 2. test on hpo for ppo. 3. another way to save checkpoint in ppo.py. 4. Boolean var in ppo_sampler. * update gitignore * change configs * update bash for hpo on gpmpc * add prior arg in gpmpc_sampler * 1. HPO effort evaluations. 2. Bash file for hpo strategy evalution. * update dependencies * add the freedom to choose between random sampler and TPE sampler. * 1. add strategy 5. 2. add unit test accordingly. * 1. prior configs. 2. update eval.py, sen.sh, and .gitifonore. * gpmpc hpo strategy study * refactor the code * 1. hpo on sac. 2. add activation arg in sac and fix a small bug. * fix typos * change to two jobs * change num of repetitions to make sure it at least has same num of samples as s2. * reduce the budget * toy example * consider 4 version of noisy functions. * include var study * improve visualization in toy examples * updated visualization improvement in toy examples. * change naming * final experiment setup * final experiment setup * modify seeding * Ignore runtime error for hpo * merge from sac * fix a bug in hpo_sampler.py * final design to show possible lower compute time. * 1. hpo on ddpg. 2. fix a small bug in ddpg_utils. * relax the threshold * relax the threshold * make rl_hpo_strategy_eval.sh automatic. * fix a bug in rl_hpo_strategy_eval.sh * add gpmpc_hpo_strategy_eval.sh * fix a small bug * fix the budget (trial) bug in configs. * prepare comparing hpo strategy on gpmpc * fix a bug in gpmpc_hpo_strategy.sh * fix bugs in bash files * fix the trial bug in config * fix a function bug in eval.py * 1. add hpo resume functionality. 2. make eval function more general. * update configs * make main.sh general * resume previous config with trial increasd. * fix the sorting bug. * fix sorting bug * a small bug fixed * fix a bug on computing reward * adsd resume functionality * edit main bash file and fix some typos * simply assign zero if numerical issues happen during HPO * adjust eval * change to boxenplot * fix typo * add reliable_stats * update outdated configs * update jupyter notebooks * update jupyter notebooks. * final update for appendix * update readme * fix typo * 1. clean up code for ppo controller, hyperparameter module. 2. Test out package dependencies and MySQL database. * test training with given optimized hp files. * 1. test hpo with and without MySQL. 2. update README. * remove discrepancy of readme. * update readme * 1. remove 'pandas' and 'seaborn' in package dependencies. 2. move tests to tests. 3. write comments in batch GP in GPMPC controller. 4. move experiments in examples. * - * update config_overrides in examples of rl * run pre-commit hooks to improve linting * 1. ignore W503 and W504 as they conflict in pre-commit-config. 2. run and pass this version of pre-commit hooks. * add activation config to the examples that use RL. * 1. standardize hpo template in the examples. 2. remove _learn(). 3. add an example of hpo for gpmpc. * run pre-commit hooks. * add gpmpc hpo test without using mysql * 1. update config of cartpole task. 2. add max_steps and exponentiated avg return in base_experiment.py. 3. use BaseExperiment class in hpo example 3. add hp study bash script and jupyter notebook for gpmpc. * 1. add bash files to automate hpo pipeline for gpmpc. 2. update gpmpc config. 3. add done_on_max_steps in base_experiment.py. 4. remove _run() and use BaseExperiment in hpo. * match .gitignore to upstram/main. * update for review * update based on the review comments. * fix typo in readme.
- Loading branch information
1 parent
dd1b293
commit f90ac22
Showing
43 changed files
with
2,223 additions
and
123 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
67 changes: 67 additions & 0 deletions
67
examples/hpo/gp_mpc/config_overrides/cartpole/cartpole_stab.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,67 @@ | ||
task_config: | ||
constraints: | ||
- constraint_form: default_constraint | ||
constrained_variable: input | ||
- constraint_form: default_constraint | ||
constrained_variable: state | ||
upper_bounds: | ||
- 100 | ||
- 100 | ||
- 100 | ||
- 100 | ||
lower_bounds: | ||
- -100 | ||
- -100 | ||
- -100 | ||
- -100 | ||
cost: quadratic | ||
ctrl_freq: 15 | ||
disturbances: | ||
observation: | ||
- disturbance_func: white_noise | ||
std: 0.0001 | ||
done_on_violation: false | ||
episode_len_sec: 10 | ||
gui: false | ||
inertial_prop: | ||
cart_mass: 1.0 | ||
pole_length: 0.5 | ||
pole_mass: 0.1 | ||
inertial_prop_randomization_info: null | ||
info_in_reset: false | ||
init_state: | ||
init_x: 0.0 | ||
init_x_dot: 0.0 | ||
init_theta: 0.0 | ||
init_theta_dot: 0.0 | ||
init_state_randomization_info: | ||
init_x: | ||
distrib: 'uniform' | ||
low: -0.1 | ||
high: 0.1 | ||
init_x_dot: | ||
distrib: 'uniform' | ||
low: -0.1 | ||
high: 0.1 | ||
init_theta: | ||
distrib: 'uniform' | ||
low: -0.2 | ||
high: 0.2 | ||
init_theta_dot: | ||
distrib: 'uniform' | ||
low: -0.1 | ||
high: 0.1 | ||
normalized_rl_action_space: false | ||
prior_prop: | ||
cart_mass: 1.0 | ||
pole_length: 0.5 | ||
pole_mass: 0.1 | ||
pyb_freq: 750 | ||
randomized_inertial_prop: false | ||
randomized_init: true | ||
task: stabilization | ||
task_info: | ||
stabilization_goal: [0] | ||
stabilization_goal_tolerance: 0.005 | ||
use_constraint_penalty: false | ||
verbose: false |
66 changes: 66 additions & 0 deletions
66
examples/hpo/gp_mpc/config_overrides/cartpole/gp_mpc_cartpole_150.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,66 @@ | ||
algo: gp_mpc | ||
algo_config: | ||
additional_constraints: null | ||
deque_size: 10 | ||
eval_batch_size: 10 | ||
gp_approx: mean_eq | ||
gp_model_path: null | ||
horizon: 20 | ||
prior_info: | ||
prior_prop: | ||
cart_mass: 1.0 | ||
pole_length: 0.5 | ||
pole_mass: 0.1 | ||
initial_rollout_std: 0.0 | ||
input_mask: null | ||
learing_rate: null | ||
learning_rate: | ||
- 0.01 | ||
- 0.01 | ||
- 0.01 | ||
- 0.01 | ||
normalize_training_data: false | ||
online_learning: false | ||
optimization_iterations: | ||
- 3000 | ||
- 3000 | ||
- 3000 | ||
- 3000 | ||
overwrite_saved_data: false | ||
prior_param_coeff: 1.5 | ||
prob: 0.95 | ||
q_mpc: | ||
- 1 | ||
- 1 | ||
- 1 | ||
- 1 | ||
r_mpc: | ||
- 0.1 | ||
kernel: Matern | ||
sparse_gp: True | ||
n_ind_points: 40 | ||
inducing_point_selection_method: 'kmeans' | ||
recalc_inducing_points_at_every_step: false | ||
soft_constraints: | ||
gp_soft_constraints: false | ||
gp_soft_constraints_coeff: 0 | ||
prior_soft_constraints: true | ||
prior_soft_constraints_coeff: 10 | ||
target_mask: null | ||
train_iterations: null | ||
test_data_ratio: 0.2 | ||
use_prev_start: true | ||
warmstart: true | ||
num_epochs: 5 | ||
num_samples: 75 | ||
num_test_episodes_per_epoch: 2 | ||
num_train_episodes_per_epoch: 2 | ||
same_test_initial_state: true | ||
same_train_initial_state: false | ||
rand_data_selection: false | ||
terminate_train_on_done: True | ||
terminate_test_on_done: False | ||
parallel: True | ||
|
||
device: cpu | ||
restore: null |
36 changes: 36 additions & 0 deletions
36
examples/hpo/gp_mpc/config_overrides/cartpole/gp_mpc_cartpole_hpo.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,36 @@ | ||
hpo_config: | ||
|
||
hpo: True # do hyperparameter optimization | ||
load_if_exists: True # this should set to True if hpo is run in parallel | ||
use_database: False # this is set to true if MySQL is used | ||
objective: [exponentiated_avg_return] # [other metrics defined in base_experiment.py] | ||
direction: [maximize] # [maximize, maximize] | ||
dynamical_runs: False # if True, dynamically increase runs | ||
warm_trials: 20 # number of trials to run before dyamical runs | ||
approximation_threshold: 5 # this is only used when dynamical_runs is True | ||
repetitions: 5 # number of samples of performance for each objective query | ||
alpha: 1 # significance level for CVaR | ||
use_gpu: True | ||
dashboard: False | ||
seed: 24 | ||
save_n_best_hps: 3 | ||
# budget | ||
trials: 40 | ||
|
||
# hyperparameters | ||
hps_config: | ||
horizon: 20 | ||
learning_rate: | ||
- 0.01 | ||
- 0.01 | ||
- 0.01 | ||
- 0.01 | ||
optimization_iterations: | ||
- 3000 | ||
- 3000 | ||
- 3000 | ||
- 3000 | ||
kernel: Matern | ||
n_ind_points: 35 | ||
num_epochs: 5 | ||
num_samples: 75 |
7 changes: 7 additions & 0 deletions
7
examples/hpo/gp_mpc/config_overrides/cartpole/optimized_hyperparameters.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
horizon: 35 | ||
kernel: 'RBF' | ||
n_ind_points: 40 | ||
num_epochs: 5 | ||
num_samples: 75 | ||
optimization_iterations: [2800, 2800, 2800, 2800] | ||
learning_rate: [0.023172075157730145, 0.023172075157730145, 0.023172075157730145, 0.023172075157730145] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,119 @@ | ||
"""Template hyperparameter optimization/hyperparameter evaluation script. | ||
""" | ||
import os | ||
from functools import partial | ||
|
||
import yaml | ||
|
||
import matplotlib.pyplot as plt | ||
import numpy as np | ||
|
||
from safe_control_gym.envs.benchmark_env import Environment, Task | ||
|
||
from safe_control_gym.hyperparameters.hpo import HPO | ||
from safe_control_gym.experiments.base_experiment import BaseExperiment | ||
from safe_control_gym.utils.configuration import ConfigFactory | ||
from safe_control_gym.utils.registration import make | ||
from safe_control_gym.utils.utils import set_device_from_config, set_dir_from_config, set_seed_from_config | ||
|
||
|
||
def hpo(config): | ||
"""Hyperparameter optimization. | ||
Usage: | ||
* to start HPO, use with `--func hpo`. | ||
""" | ||
|
||
# Experiment setup. | ||
if config.hpo_config.hpo: | ||
set_dir_from_config(config) | ||
set_seed_from_config(config) | ||
set_device_from_config(config) | ||
|
||
# HPO | ||
hpo = HPO(config.algo, | ||
config.task, | ||
config.sampler, | ||
config.load_study, | ||
config.output_dir, | ||
config.task_config, | ||
config.hpo_config, | ||
**config.algo_config) | ||
|
||
if config.hpo_config.hpo: | ||
hpo.hyperparameter_optimization() | ||
print('Hyperparameter optimization done.') | ||
|
||
|
||
def train(config): | ||
"""Training for a given set of hyperparameters. | ||
Usage: | ||
* to start training, use with `--func train`. | ||
""" | ||
# Override algo_config with given yaml file | ||
if config.opt_hps == '': | ||
# if no opt_hps file is given | ||
pass | ||
else: | ||
# if opt_hps file is given | ||
with open(config.opt_hps, 'r') as f: | ||
opt_hps = yaml.load(f, Loader=yaml.FullLoader) | ||
for hp in opt_hps: | ||
if isinstance(config.algo_config[hp], list) and not isinstance(opt_hps[hp], list): | ||
config.algo_config[hp] = [opt_hps[hp]] * len(config.algo_config[hp]) | ||
else: | ||
config.algo_config[hp] = opt_hps[hp] | ||
# Experiment setup. | ||
set_dir_from_config(config) | ||
set_seed_from_config(config) | ||
set_device_from_config(config) | ||
|
||
# Define function to create task/env. | ||
env_func = partial(make, config.task, output_dir=config.output_dir, **config.task_config) | ||
# Create the controller/control_agent. | ||
# Note: | ||
# eval_env will take config.seed * 111 as its seed | ||
# env will take config.seed as its seed | ||
control_agent = make(config.algo, | ||
env_func, | ||
training=True, | ||
checkpoint_path=os.path.join(config.output_dir, 'model_latest.pt'), | ||
output_dir=config.output_dir, | ||
use_gpu=config.use_gpu, | ||
seed=config.seed, | ||
**config.algo_config) | ||
control_agent.reset() | ||
|
||
eval_env = env_func(seed=config.seed * 111) | ||
# Run experiment | ||
experiment = BaseExperiment(eval_env, control_agent) | ||
experiment.launch_training() | ||
results, metrics = experiment.run_evaluation(n_episodes=1, n_steps=None, done_on_max_steps=True) | ||
control_agent.close() | ||
|
||
return eval_env.X_GOAL, results, metrics | ||
|
||
|
||
MAIN_FUNCS = {'hpo': hpo, 'train': train} | ||
|
||
|
||
if __name__ == '__main__': | ||
|
||
# Make config. | ||
fac = ConfigFactory() | ||
fac.add_argument('--func', type=str, default='train', help='main function to run.') | ||
fac.add_argument('--opt_hps', type=str, default='', help='yaml file as a result of HPO.') | ||
fac.add_argument('--load_study', type=bool, default=False, help='whether to load study from a previous HPO.') | ||
fac.add_argument('--sampler', type=str, default='TPESampler', help='which sampler to use in HPO.') | ||
# merge config | ||
config = fac.merge() | ||
|
||
# Execute. | ||
func = MAIN_FUNCS.get(config.func, None) | ||
if func is None: | ||
raise Exception('Main function {} not supported.'.format(config.func)) | ||
func(config) |
61 changes: 61 additions & 0 deletions
61
examples/hpo/rl/config_overrides/cartpole/cartpole_stab.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,61 @@ | ||
task_config: | ||
info_in_reset: True | ||
ctrl_freq: 15 | ||
pyb_freq: 750 | ||
physics: pyb | ||
|
||
# state initialization | ||
init_state: | ||
init_x: 0.0 | ||
init_x_dot: 0.0 | ||
init_theta: 0.0 | ||
init_theta_dot: 0.0 | ||
randomized_init: True | ||
randomized_inertial_prop: False | ||
normalized_rl_action_space: True | ||
|
||
init_state_randomization_info: | ||
init_x: | ||
distrib: 'uniform' | ||
low: -0.1 | ||
high: 0.1 | ||
init_x_dot: | ||
distrib: 'uniform' | ||
low: -0.1 | ||
high: 0.1 | ||
init_theta: | ||
distrib: 'uniform' | ||
low: -0.2 | ||
high: 0.2 | ||
init_theta_dot: | ||
distrib: 'uniform' | ||
low: -0.1 | ||
high: 0.1 | ||
|
||
task: stabilization | ||
task_info: | ||
stabilization_goal: [0] | ||
stabilization_goal_tolerance: 0.005 | ||
|
||
inertial_prop: | ||
pole_length: 0.5 | ||
cart_mass: 1 | ||
pole_mass: 0.1 | ||
|
||
episode_len_sec: 10 | ||
cost: rl_reward | ||
obs_goal_horizon: 1 | ||
|
||
# RL Reward | ||
rew_state_weight: [1, 1, 1, 1] | ||
rew_act_weight: 0.1 | ||
rew_exponential: True | ||
|
||
# constraints | ||
constraints: | ||
- constraint_form: default_constraint | ||
constrained_variable: state | ||
- constraint_form: default_constraint | ||
constrained_variable: input | ||
done_on_out_of_bound: True | ||
done_on_violation: False |
13 changes: 13 additions & 0 deletions
13
examples/hpo/rl/ppo/config_overrides/cartpole/optimized_hyperparameters.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
activation: leaky_relu | ||
actor_lr: 0.0007948148615930024 | ||
clip_param: 0.1 | ||
critic_lr: 0.007497368468753617 | ||
entropy_coef: 0.00010753631441212628 | ||
gae_lambda: 0.8 | ||
gamma: 0.98 | ||
hidden_dim: 32 | ||
max_env_steps: 72000 | ||
mini_batch_size: 128 | ||
opt_epochs: 5 | ||
rollout_steps: 150 | ||
target_kl: 1.587713889686473e-07 |
Oops, something went wrong.