Spatial Forcing: Implicit Spatial Representation Alignment for Vision-language-action Model

📢 News!

[2025/10/28] We released our codes based on Pi_0 in real world! Everyone is welcome to use it!🎉
[2025/10/24] 🏆 Congradulations to Jialong! He and our SF got the second place in Agibot World Challenge as well as 5000$ prize💰!
[2025/10/18] Our paper won the 🥇first place in the daily list and 🥉third place in the weekly list in HF! ⭐
[2025/10/12] We released our paper on ArXiv.

🌟 Key Features of Spatial-Forcing (SF)

Universality: SF is a plug-and-play 3D finetune strategy that can be seamlessly integrated with any VLA training process, requiring only 30 lines of code modifications. It substantially enhances spatial reasoning and manipulation capabilities. We provide implementations based on OpenVLA and Pi0, along with a quick-start guide for adapting SF to other VLA models.
Strong Performance: SF achieves state-of-the-art (SOTA) results on both LIBERO and RoboTwin benchmarks.
In real-world experiments involving complex spatial structures, SF improves task success rates by up to 50%.
Efficient Training: SF requires only 3% of the training steps or 5% of the training data to reach a 66% success rate on LIBERO-Long. Moreover, it achieves strong real-world performance with as few as 20 demonstrations.

📃 Overview

Our Spatial-Forcing (SF) model aligns the intermediate visual embeddings of VLAs with geometric representations produced by pretrained 3D foundation models. This alignment demonstrates enhanced performance, training efficiency, and data effificency.

🚀 Get Started

To reproduce our simulation results, ref to our openvla-SF folder.
To deploy policy in real-world robots, ref to our openpi-SF folder.
To integrate Spatial-Forcing strategy into your VLA model, ref to Simulation Training Scripts Line373-Line400.

🔥 TODO List

✅ Training and inference code on LIBERO (Base model: OpenVLA)
✅ Checkpoints on LIBERO (Base model: OpenVLA)
✅ Deployment code in real world (Base model: Pi_0 torch version)

🌏 Contact

For further discussion and collaboration, please feel free to contact us via Email and WeChat:

Author	Email	WeChat
Fuhao Li	[email protected]	haofuly
Wenxuan Song	[email protected]	swx0757

❤️ Acknowledgement

We thank these great works and open-source codebases: OpenVLA-OFT & OpenPI & VGGT & REPA

🖊 Citation

If you find this work useful, please cite:

@article{spatialforcing2025,
  author    = {Li Fuhao, Song Wenxuan, Zhao Han, Wang Jingbo, Ding Pengxiang, Wang Donglin, Zeng Long, Li Haoang},
  title     = {Spatial Forcing: Implicit Spatial Representation Alignment For Vision-Language-Action Model},
  journal   = {arXiv preprint arXiv:2510.12276},
  year      = {2025},
}

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
figs		figs
openpi-SF		openpi-SF
openvla-SF		openvla-SF
LICENSE		LICENSE
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Spatial Forcing: Implicit Spatial Representation Alignment for Vision-language-action Model

📢 News!

🌟 Key Features of Spatial-Forcing (SF)

📃 Overview

🚀 Get Started

🔥 TODO List

🌏 Contact

❤️ Acknowledgement

🖊 Citation

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

OpenHelix-Team/Spatial-Forcing

Folders and files

Latest commit

History

Repository files navigation

Spatial Forcing: Implicit Spatial Representation Alignment for Vision-language-action Model

📢 News!

🌟 Key Features of Spatial-Forcing (SF)

📃 Overview

🚀 Get Started

🔥 TODO List

🌏 Contact

❤️ Acknowledgement

🖊 Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages