This repository contains code for the results in: Probably Approximately Correct Vision-Based Planning using Motion Primitives
- Quadrotor navigating an obstacle field using a depth map from an onboard RGB-D camera
- Quadruped (Minitaur, Ghost Robotics) traversing rough terrain using proprioceptive and exteroceptive (depth map from onboard RGB-D camera) feedback
The results are demonstrated in this video on a few test environments.
- PyBullet==2.6.5 (
pip install pybullet==2.6.5
) - PyTorch
- Tensorboard
- CVXPY
- MOSEK
- Relevant parameters for each example are provided in a config json file located in the configs folder.
- Ensure before training that
num_cpu
andnum_gpu
parameters in the config file reflect your system specs. - The environments for each training example are drawn from a distribution, hence they are generated by varying the random seed. In particular, this allows us to index the environments using random seeds with ease. In the config file,
num_trials
is the number of environments to train on andstart_seed
is the starting index of the environments. We train on environments fromstart_seed
tostart_seed + num_trials
.
- Train a Prior using Evolutionary Strategies:
- Quadrotor:
python train_ES.py --config_file configs/config_quadrotor.json
- Minitaur:
python train_ES.py --config_file configs/config_minitaur.json
- Quadrotor:
Note: Training the prior is computationally demanding. The quadrotor was trained on 480 environments (seeds:100-579) on an AWS g3.16xlarge instance with 60 CPU workers and 4 GPUs, while the Minitaur was trained on 10 environments (seeds:0-9) with 10 CPU workers and 1 GPU (Titan XP, 12 GB). For your convenience, we have shared the prior trained for the results in the paper, so this step can be skipped. Running the relevant config file will automatically load the relevant weights from the Weights folder.
- Draw
m
policies i.i.d. from the prior trained above and compute the cost for each policy onN
new environments:- Quadrotor:
python compute_policy_costs.py --config_file configs/config_quadrotor.json --start_seed 580 --num_envs N --num_policies m
- Minitaur:
python compute_policy_costs.py --config_file configs/config_minitaur.json --start_seed 10 --num_envs N --num_policies m
- Quadrotor:
Note: We have shared the computed cost matrix with 4000 environments, 50 policies for the quadrotor and 2000 environments, 50 policies for the Minitaur; see Weights/C_quadrotor.npy
and Weights/C_minitaur.npy
- Perform PAC-Bayes optimization with the parametric REP in the paper on
N_pac (<=N)
environments andm_pac (<=m)
policies using the costs computed above:- Quadrotor:
python PAC_Bayes_opt.py --config_file configs/config_quadrotor.json --num_envs N_pac --num_policies m_pac
- Minitaur:
python PAC_Bayes_opt.py --config_file configs/config_minitaur.json --num_envs N_pac --num_policies m_pac
- Quadrotor:
Additionally, we provide the matrices Weights/C_quadrotor_emp_test.npy
and Weights/C_minitaur_emp_test.npy
on 5000 environments (seeds:5000-9999) and 50 policies each to emprically estimate the true cost of the posterior.
To visualize the trained posterior for m_test
policy draws per environment on environments from N0
(default 10000
) to N0+N
:
- Quadrotor:
python quad_test.py --config_file configs/config_quadrotor.json --start_seed N0 --num_envs N --num_draws m_test
- Minitaur:
python minitaur_test.py --config_file configs/config_minitaur.json --start_seed N0 --num_envs N --num_draws m_test
Note: Choose m_test
sufficiently large; at least 5