- [2025/10/28] We released our codes based on Pi_0 in real world! Everyone is welcome to use it!🎉
- [2025/10/24] 🏆 Congradulations to Jialong! He and our SF got the second place in Agibot World Challenge as well as 5000$ prize💰!
- [2025/10/18] Our paper won the 🥇first place in the daily list and 🥉third place in the weekly list in HF! ⭐
- [2025/10/12] We released our paper on ArXiv.
-
Universality: SF is a plug-and-play 3D finetune strategy that can be seamlessly integrated with any VLA training process, requiring only 30 lines of code modifications. It substantially enhances spatial reasoning and manipulation capabilities. We provide implementations based on OpenVLA and Pi0, along with a quick-start guide for adapting SF to other VLA models.
-
Strong Performance: SF achieves state-of-the-art (SOTA) results on both LIBERO and RoboTwin benchmarks.
In real-world experiments involving complex spatial structures, SF improves task success rates by up to 50%. -
Efficient Training: SF requires only 3% of the training steps or 5% of the training data to reach a 66% success rate on LIBERO-Long. Moreover, it achieves strong real-world performance with as few as 20 demonstrations.
Our Spatial-Forcing (SF) model aligns the intermediate visual embeddings of VLAs with geometric representations produced by pretrained 3D foundation models. This alignment demonstrates enhanced performance, training efficiency, and data effificency.
-
To reproduce our simulation results, ref to our openvla-SF folder.
-
To deploy policy in real-world robots, ref to our openpi-SF folder.
-
To integrate Spatial-Forcing strategy into your VLA model, ref to Simulation Training Scripts Line373-Line400.
✅ Training and inference code on LIBERO (Base model: OpenVLA)
✅ Checkpoints on LIBERO (Base model: OpenVLA)
✅ Deployment code in real world (Base model: Pi_0 torch version)
For further discussion and collaboration, please feel free to contact us via Email and WeChat:
| Author | ||
|---|---|---|
| Fuhao Li | [email protected] | haofuly |
| Wenxuan Song | [email protected] | swx0757 |
We thank these great works and open-source codebases: OpenVLA-OFT & OpenPI & VGGT & REPA
If you find this work useful, please cite:
@article{spatialforcing2025,
author = {Li Fuhao, Song Wenxuan, Zhao Han, Wang Jingbo, Ding Pengxiang, Wang Donglin, Zeng Long, Li Haoang},
title = {Spatial Forcing: Implicit Spatial Representation Alignment For Vision-Language-Action Model},
journal = {arXiv preprint arXiv:2510.12276},
year = {2025},
}