A modular JAX-based implementation of Model Predictive Path Integral (MPPI) and POLO (Plan Online, Learn Offline) algorithms for continuous control tasks using MuJoCo.
Clone the repo and install dependencies with Pipenv:
git clone https://github.com/mateuszkor/mppi_implementation.git
cd mppi_implementation
pipenv install
pipenv shell
Make sure you have mujoco
and its dependencies properly set up on your system.
You can run one of the following algorithms:
mppi
: Standard Model Predictive Path Integral control.polo
: POLO with trajectory optimization and value function learning.polo_td
: POLO with temporal-difference value learning.
Available simulation environments:
swingup
hand_fixed
hand_free
Each task has its own configuration file under config/{algorithm}/
.
To run a simulation:
python runner.py
If running on MAC use following:
mjpython runner.py
You can modify the algorithm and task at the bottom of runner.py
:
algorithm = "vanilla_mppi" # Options: "vanill_mppi", "polo", "polo_td"
simulation = "swingup" # Options: "swingup", "hand_fixed", "hand_free"
Configurations are loaded from:
config/{algorithm}/{task}.yaml
Logging to Weights & Biases is optional and can be toggled with:
use_wandb = 1
├── runner.py # Entry point for running simulations
├── config/ # YAML configs for algorithms/tasks
├── models/ # MPPI, POLO implementations
├── nn/ # Neural network modules
├── simulations/ # Simulation constructors and cost functions
├── utils/ # Replay buffer, helpers
To enable Weights & Biases tracking:
- Set
use_wandb = 1
inrunner.py
. - Add your API key or configure W&B locally.