Skip to content

Commit

Permalink
Hyperparameter Optimization Module (utiasDSL#151)
Browse files Browse the repository at this point in the history
* 1. bug fixed. 2. kernel extension. 3. batch GP implementatoin.

* update dependencies

* explicitliy import scipy.linalg

* add cartpole configs for gpmpc

* add hyperparameter optimization module

* catch all the exception in hpo for debugging purpose.

* put cartpole configs for gpmpc under the folder of gpmpc

* add hpo scripts

* 1. include pandas 2. change rel import in gpmpc_experiment.py 3. remove unnecessary config in cartpole_stab.yaml 2. add hpo module in test_build.py

* rename config to match default algo name.

* remove old configs

* add tests

* edit bash file with correct arg name

* add another host in gpmpc_hpo.sh

* change to new dir in gpmpc_hpo.sh

* 1. fix a small bug 2. add test_train_gpmpc_cartpole

* add a hpo parallelism test

* saving before runing hpo

* I think the bug is that it reaches thee goal in the first step.

* 1. PPO configs. 2. Make cartpole init states harder. 3. First version of JSRL on PPO.

* Re-organize a bit (file name, remove __init__.py in test folders).

* 1. HPO strategies. 2. test on hpo for ppo. 3. another way to save checkpoint in ppo.py. 4. Boolean var in ppo_sampler.

* update gitignore

* change configs

* update bash for hpo on gpmpc

* add prior arg in gpmpc_sampler

* 1. HPO effort evaluations. 2. Bash file for hpo strategy evalution.

* update dependencies

* add the freedom to choose between random sampler and TPE sampler.

* 1. add strategy 5. 2. add unit test accordingly.

* 1. prior configs. 2. update eval.py, sen.sh, and .gitifonore.

* gpmpc hpo strategy study

* refactor the code

* 1. hpo on sac. 2. add activation arg in sac and fix a small bug.

* fix typos

* change to two jobs

* change num of repetitions to make sure it at least has same num of samples as s2.

* reduce the budget

* toy example

* consider 4 version of noisy functions.

* include var study

* improve visualization in toy examples

* updated visualization improvement in toy examples.

* change naming

* final experiment setup

* final experiment setup

* modify seeding

* Ignore runtime error for hpo

* merge from sac

* fix a bug in hpo_sampler.py

* final design to show possible lower compute time.

* 1. hpo on ddpg. 2. fix a small bug in ddpg_utils.

* relax the threshold

* relax the threshold

* make rl_hpo_strategy_eval.sh automatic.

* fix a bug in rl_hpo_strategy_eval.sh

* add gpmpc_hpo_strategy_eval.sh

* fix a small bug

* fix the budget (trial) bug in configs.

* prepare comparing hpo strategy on gpmpc

* fix a bug in gpmpc_hpo_strategy.sh

* fix bugs in bash files

* fix the trial bug in config

* fix a function bug in eval.py

* 1. add hpo resume functionality. 2. make eval function more general.

* update configs

* make main.sh general

* resume previous config with trial increasd.

* fix the sorting bug.

* fix sorting bug

* a small bug fixed

* fix a bug on computing reward

* adsd resume functionality

* edit main bash file and fix some typos

* simply assign zero if numerical issues happen during HPO

* adjust eval

* change to boxenplot

* fix typo

* add reliable_stats

* update outdated configs

* update jupyter notebooks

* update jupyter notebooks.

* final update for appendix

* update readme

* fix typo

* 1. clean up code for ppo controller, hyperparameter module. 2. Test out package dependencies and MySQL database.

* test training with given optimized hp files.

* 1. test hpo with and without MySQL. 2. update README.

* remove discrepancy of readme.

* update readme

* 1. remove 'pandas' and 'seaborn' in package dependencies. 2. move tests to tests. 3. write comments in batch GP in GPMPC controller. 4. move experiments in examples.

* -

* update config_overrides in examples of rl

* run pre-commit hooks to improve linting

* 1. ignore W503 and W504 as they conflict in pre-commit-config. 2. run and pass this version of pre-commit hooks.

* add activation config to the examples that use RL.

* 1. standardize hpo template in the examples. 2. remove _learn(). 3. add an example of hpo for gpmpc.

* run pre-commit hooks.

* add gpmpc hpo test without using mysql

* 1. update config of cartpole task. 2. add max_steps and exponentiated avg return in base_experiment.py. 3. use BaseExperiment class in hpo example 3. add hp study bash script and jupyter notebook for gpmpc.

* 1. add bash files to automate hpo pipeline for gpmpc. 2. update gpmpc config. 3. add done_on_max_steps in base_experiment.py. 4. remove _run() and use BaseExperiment in hpo.

* match .gitignore to upstram/main.

* update for review

* update based on the review comments.

* fix typo in readme.
  • Loading branch information
middleyuan authored May 14, 2024
1 parent dd1b293 commit f90ac22
Show file tree
Hide file tree
Showing 43 changed files with 2,223 additions and 123 deletions.
1 change: 1 addition & 0 deletions examples/cbf/config_overrides/ppo_config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@ algo: ppo
algo_config:
# model args
hidden_dim: 64
activation: "relu"
norm_obs: False
norm_reward: False
clip_obs: 10.0
Expand Down
1 change: 1 addition & 0 deletions examples/cbf/config_overrides/sac_config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@ algo: sac
algo_config:
# model args
hidden_dim: 256
activation: "relu"
use_entropy_tuning: False

# optim args
Expand Down
67 changes: 67 additions & 0 deletions examples/hpo/gp_mpc/config_overrides/cartpole/cartpole_stab.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
task_config:
constraints:
- constraint_form: default_constraint
constrained_variable: input
- constraint_form: default_constraint
constrained_variable: state
upper_bounds:
- 100
- 100
- 100
- 100
lower_bounds:
- -100
- -100
- -100
- -100
cost: quadratic
ctrl_freq: 15
disturbances:
observation:
- disturbance_func: white_noise
std: 0.0001
done_on_violation: false
episode_len_sec: 10
gui: false
inertial_prop:
cart_mass: 1.0
pole_length: 0.5
pole_mass: 0.1
inertial_prop_randomization_info: null
info_in_reset: false
init_state:
init_x: 0.0
init_x_dot: 0.0
init_theta: 0.0
init_theta_dot: 0.0
init_state_randomization_info:
init_x:
distrib: 'uniform'
low: -0.1
high: 0.1
init_x_dot:
distrib: 'uniform'
low: -0.1
high: 0.1
init_theta:
distrib: 'uniform'
low: -0.2
high: 0.2
init_theta_dot:
distrib: 'uniform'
low: -0.1
high: 0.1
normalized_rl_action_space: false
prior_prop:
cart_mass: 1.0
pole_length: 0.5
pole_mass: 0.1
pyb_freq: 750
randomized_inertial_prop: false
randomized_init: true
task: stabilization
task_info:
stabilization_goal: [0]
stabilization_goal_tolerance: 0.005
use_constraint_penalty: false
verbose: false
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
algo: gp_mpc
algo_config:
additional_constraints: null
deque_size: 10
eval_batch_size: 10
gp_approx: mean_eq
gp_model_path: null
horizon: 20
prior_info:
prior_prop:
cart_mass: 1.0
pole_length: 0.5
pole_mass: 0.1
initial_rollout_std: 0.0
input_mask: null
learing_rate: null
learning_rate:
- 0.01
- 0.01
- 0.01
- 0.01
normalize_training_data: false
online_learning: false
optimization_iterations:
- 3000
- 3000
- 3000
- 3000
overwrite_saved_data: false
prior_param_coeff: 1.5
prob: 0.95
q_mpc:
- 1
- 1
- 1
- 1
r_mpc:
- 0.1
kernel: Matern
sparse_gp: True
n_ind_points: 40
inducing_point_selection_method: 'kmeans'
recalc_inducing_points_at_every_step: false
soft_constraints:
gp_soft_constraints: false
gp_soft_constraints_coeff: 0
prior_soft_constraints: true
prior_soft_constraints_coeff: 10
target_mask: null
train_iterations: null
test_data_ratio: 0.2
use_prev_start: true
warmstart: true
num_epochs: 5
num_samples: 75
num_test_episodes_per_epoch: 2
num_train_episodes_per_epoch: 2
same_test_initial_state: true
same_train_initial_state: false
rand_data_selection: false
terminate_train_on_done: True
terminate_test_on_done: False
parallel: True

device: cpu
restore: null
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
hpo_config:

hpo: True # do hyperparameter optimization
load_if_exists: True # this should set to True if hpo is run in parallel
use_database: False # this is set to true if MySQL is used
objective: [exponentiated_avg_return] # [other metrics defined in base_experiment.py]
direction: [maximize] # [maximize, maximize]
dynamical_runs: False # if True, dynamically increase runs
warm_trials: 20 # number of trials to run before dyamical runs
approximation_threshold: 5 # this is only used when dynamical_runs is True
repetitions: 5 # number of samples of performance for each objective query
alpha: 1 # significance level for CVaR
use_gpu: True
dashboard: False
seed: 24
save_n_best_hps: 3
# budget
trials: 40

# hyperparameters
hps_config:
horizon: 20
learning_rate:
- 0.01
- 0.01
- 0.01
- 0.01
optimization_iterations:
- 3000
- 3000
- 3000
- 3000
kernel: Matern
n_ind_points: 35
num_epochs: 5
num_samples: 75
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
horizon: 35
kernel: 'RBF'
n_ind_points: 40
num_epochs: 5
num_samples: 75
optimization_iterations: [2800, 2800, 2800, 2800]
learning_rate: [0.023172075157730145, 0.023172075157730145, 0.023172075157730145, 0.023172075157730145]
119 changes: 119 additions & 0 deletions examples/hpo/hpo_experiment.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
"""Template hyperparameter optimization/hyperparameter evaluation script.
"""
import os
from functools import partial

import yaml

import matplotlib.pyplot as plt
import numpy as np

from safe_control_gym.envs.benchmark_env import Environment, Task

from safe_control_gym.hyperparameters.hpo import HPO
from safe_control_gym.experiments.base_experiment import BaseExperiment
from safe_control_gym.utils.configuration import ConfigFactory
from safe_control_gym.utils.registration import make
from safe_control_gym.utils.utils import set_device_from_config, set_dir_from_config, set_seed_from_config


def hpo(config):
"""Hyperparameter optimization.
Usage:
* to start HPO, use with `--func hpo`.
"""

# Experiment setup.
if config.hpo_config.hpo:
set_dir_from_config(config)
set_seed_from_config(config)
set_device_from_config(config)

# HPO
hpo = HPO(config.algo,
config.task,
config.sampler,
config.load_study,
config.output_dir,
config.task_config,
config.hpo_config,
**config.algo_config)

if config.hpo_config.hpo:
hpo.hyperparameter_optimization()
print('Hyperparameter optimization done.')


def train(config):
"""Training for a given set of hyperparameters.
Usage:
* to start training, use with `--func train`.
"""
# Override algo_config with given yaml file
if config.opt_hps == '':
# if no opt_hps file is given
pass
else:
# if opt_hps file is given
with open(config.opt_hps, 'r') as f:
opt_hps = yaml.load(f, Loader=yaml.FullLoader)
for hp in opt_hps:
if isinstance(config.algo_config[hp], list) and not isinstance(opt_hps[hp], list):
config.algo_config[hp] = [opt_hps[hp]] * len(config.algo_config[hp])
else:
config.algo_config[hp] = opt_hps[hp]
# Experiment setup.
set_dir_from_config(config)
set_seed_from_config(config)
set_device_from_config(config)

# Define function to create task/env.
env_func = partial(make, config.task, output_dir=config.output_dir, **config.task_config)
# Create the controller/control_agent.
# Note:
# eval_env will take config.seed * 111 as its seed
# env will take config.seed as its seed
control_agent = make(config.algo,
env_func,
training=True,
checkpoint_path=os.path.join(config.output_dir, 'model_latest.pt'),
output_dir=config.output_dir,
use_gpu=config.use_gpu,
seed=config.seed,
**config.algo_config)
control_agent.reset()

eval_env = env_func(seed=config.seed * 111)
# Run experiment
experiment = BaseExperiment(eval_env, control_agent)
experiment.launch_training()
results, metrics = experiment.run_evaluation(n_episodes=1, n_steps=None, done_on_max_steps=True)
control_agent.close()

return eval_env.X_GOAL, results, metrics


MAIN_FUNCS = {'hpo': hpo, 'train': train}


if __name__ == '__main__':

# Make config.
fac = ConfigFactory()
fac.add_argument('--func', type=str, default='train', help='main function to run.')
fac.add_argument('--opt_hps', type=str, default='', help='yaml file as a result of HPO.')
fac.add_argument('--load_study', type=bool, default=False, help='whether to load study from a previous HPO.')
fac.add_argument('--sampler', type=str, default='TPESampler', help='which sampler to use in HPO.')
# merge config
config = fac.merge()

# Execute.
func = MAIN_FUNCS.get(config.func, None)
if func is None:
raise Exception('Main function {} not supported.'.format(config.func))
func(config)
61 changes: 61 additions & 0 deletions examples/hpo/rl/config_overrides/cartpole/cartpole_stab.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
task_config:
info_in_reset: True
ctrl_freq: 15
pyb_freq: 750
physics: pyb

# state initialization
init_state:
init_x: 0.0
init_x_dot: 0.0
init_theta: 0.0
init_theta_dot: 0.0
randomized_init: True
randomized_inertial_prop: False
normalized_rl_action_space: True

init_state_randomization_info:
init_x:
distrib: 'uniform'
low: -0.1
high: 0.1
init_x_dot:
distrib: 'uniform'
low: -0.1
high: 0.1
init_theta:
distrib: 'uniform'
low: -0.2
high: 0.2
init_theta_dot:
distrib: 'uniform'
low: -0.1
high: 0.1

task: stabilization
task_info:
stabilization_goal: [0]
stabilization_goal_tolerance: 0.005

inertial_prop:
pole_length: 0.5
cart_mass: 1
pole_mass: 0.1

episode_len_sec: 10
cost: rl_reward
obs_goal_horizon: 1

# RL Reward
rew_state_weight: [1, 1, 1, 1]
rew_act_weight: 0.1
rew_exponential: True

# constraints
constraints:
- constraint_form: default_constraint
constrained_variable: state
- constraint_form: default_constraint
constrained_variable: input
done_on_out_of_bound: True
done_on_violation: False
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
activation: leaky_relu
actor_lr: 0.0007948148615930024
clip_param: 0.1
critic_lr: 0.007497368468753617
entropy_coef: 0.00010753631441212628
gae_lambda: 0.8
gamma: 0.98
hidden_dim: 32
max_env_steps: 72000
mini_batch_size: 128
opt_epochs: 5
rollout_steps: 150
target_kl: 1.587713889686473e-07
Loading

0 comments on commit f90ac22

Please sign in to comment.