This repository extends Isaac Lab with environments and training pipelines for
and related model-based reinforcement learning methods.
It enables:
- joint training of policies and neural dynamics models in Isaac Lab (online),
- training of policies with learned neural network dynamics without any simulator (offline),
- evaluation of model-based vs. model-free policies,
- visualization of autoregressive imagination rollouts from learned dynamics,
- visualization of trained policies in Isaac Lab.
Authors: Chenhao Li, Andreas Krause, Marco Hutter
Affiliation: ETH AI Center, Learning & Adaptive Systems Group and Robotic Systems Lab, ETH Zurich
- Install Isaac Lab (not needed for offline policy training)
Follow the official installation guide. We recommend using the Conda installation as it simplifies calling Python scripts from the terminal.
- Install model-based RSL RL
Follow the official installation guide of model-based RSL RL for model-based reinforcement learning to replace the rsl_rl_lib that comes with Isaac Lab.
- Clone this repository (outside your Isaac Lab directory)
git clone [email protected]:leggedrobotics/robotic_world_model.git- Install the extension using the Python environment where Isaac Lab is installed
python -m pip install -e source/mbrl- Verify the installation (not needed for offline policy training)
python scripts/reinforcement_learning/rsl_rl/train.py --task Template-Isaac-Velocity-Flat-Anymal-D-Init-v0 --headlessRobotic World Model is a model-based reinforcement learning algorithm that learns a dynamics model and a policy concurrently.
You can configure the model inputs and outputs under ObservationsCfg_PRETRAIN in AnymalDFlatEnvCfg_PRETRAIN.
Available components:
SystemStateCfg: state input and output headSystemActionCfg: action inputSystemExtensionCfg: continuous privileged output head (e.g. rewards etc.)SystemContactCfg: binary privileged output head (e.g. contacts)SystemTerminationCfg: binary privileged output head (e.g. terminations)
And you can configure the model architecture and training hyperparameters under RslRlSystemDynamicsCfg and RslRlMbrlPpoAlgorithmCfg in AnymalDFlatPPOPretrainRunnerCfg .
Available options:
ensemble_size: ensemble size for uncertainty estimationhistory_horizon: stacked history horizonarchitecture_config: architecture configurationsystem_dynamics_forecast_horizon: autoregressive prediction steps
python scripts/reinforcement_learning/rsl_rl/train.py \
--task Template-Isaac-Velocity-Flat-Anymal-D-Pretrain-v0 \
--headlessIt trains a PPO policy from scratch, while the induced experience during training is used to train the dynamics model.
python scripts/reinforcement_learning/rsl_rl/visualize.py \
--task Template-Isaac-Velocity-Flat-Anymal-D-Visualize-v0 \
--checkpoint <checkpoint_path> \
--system_dynamics_load_path <dynamics_model_path>It visualizes the learned dynamics model by rolling out the model autoregressively in imagination, conditioned on the actions from the learned policy.
The dynamics_model_path should point to the pretrained dynamics model checkpoint (e.g. model_<iteration>.pt) inside the saved run directory.
Once a dynamics model is pretrained, you can train a model-based policy purely from imagined rollouts generated by the learned dynamics.
There are two options:
- Option 1: Train policy in imagination online, where additional environment interactions are continually collected using the latest policy to update the dynamics model (as implemented with RWM and MBPO-PPO in Robotic World Model: A Neural Network Simulator for Robust Policy Optimization in Robotics).
- Option 2: Train policy in imagination offline where no additional environment interactions are collected and the policy has to rely on the static dynamics model (as implemented with RWM-U and MOPO-PPO in Uncertainty-Aware Robotic World Model Makes Offline Model-Based Reinforcement Learning Work on Real Robots).
The online data collection relies on interactions with the environment and thus brings up the simulator.
python scripts/reinforcement_learning/rsl_rl/train.py --task Template-Isaac-Velocity-Flat-Anymal-D-Finetune-v0 --headless --checkpoint <checkpoint_path> --system_dynamics_load_path <dynamics_model_path>You can either start the policy from pretrained checkpoints or from scratch by simply omitting the --checkpoint argument.
The offline policy training does not request any new data and thus relies solely on the static dynamics model.
Align the model architecture and specify the model load path under ModelArchitectureConfig in AnymalDFlatConfig.
Additionally, the offline imagination needs to branch off from some initial states. Specify the data path under DataConfig in AnymalDFlatConfig.
python scripts/reinforcement_learning/model_based/train.py --task anymal_d_flatYou can play the learned policies with the original Isaac Lab task registry.
python scripts/reinforcement_learning/rsl_rl/play.py --task Isaac-Velocity-Flat-Anymal-D-Play-v0 --checkpoint <checkpoint_path>We provide a reference pipeline that enables RWM and RWM-U on ANYmal D.
Key files:
Online
- Environment configurations + dynamics model setup
flat_env_cfg.py. - Algorithm configuration + training parameters
rsl_rl_ppo_cfg.py. - Imagination rollout logic (constructs policy observations & rewards from model outputs)
anymal_d_manager_based_mbrl_env. - Visualization environment + rollout reset
anymal_d_manager_based_visualize_env.py.
Offline
- Environment configurations + Imagination rollout logic (constructs policy observations & rewards from model outputs)
anymal_d_flat.py. - Algorithm configuration + training parameters
anymal_d_flat_cfg.py. - Pretrained RWM-U checkpoint
pretrain_rnn_ens.pt. - Initial states for imagination rollout
state_action_data_0.csv.
If you find this repository useful for your research, please consider citing:
@article{li2025robotic,
title={Robotic world model: A neural network simulator for robust policy optimization in robotics},
author={Li, Chenhao and Krause, Andreas and Hutter, Marco},
journal={arXiv preprint arXiv:2501.10100},
year={2025}
}
@article{li2025offline,
title={Offline Robotic World Model: Learning Robotic Policies without a Physics Simulator},
author={Li, Chenhao and Krause, Andreas and Hutter, Marco},
journal={arXiv preprint arXiv:2504.16680},
year={2025}
}

