Skip to content

OpenHelix-Team/Spatial-Forcing

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

39 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Spatial Forcing: Implicit Spatial Representation Alignment for Vision-language-action Model

Paper Page Hugging Face Collection WeChat

📢 News!

  • [2025/10/28] We released our codes based on Pi_0 in real world! Everyone is welcome to use it!🎉
  • [2025/10/24] 🏆 Congradulations to Jialong! He and our SF got the second place in Agibot World Challenge as well as 5000$ prize💰!
  • [2025/10/18] Our paper won the 🥇first place in the daily list and 🥉third place in the weekly list in HF! ⭐
  • [2025/10/12] We released our paper on ArXiv.

🌟 Key Features of Spatial-Forcing (SF)

  1. Universality: SF is a plug-and-play 3D finetune strategy that can be seamlessly integrated with any VLA training process, requiring only 30 lines of code modifications. It substantially enhances spatial reasoning and manipulation capabilities. We provide implementations based on OpenVLA and Pi0, along with a quick-start guide for adapting SF to other VLA models.

  2. Strong Performance: SF achieves state-of-the-art (SOTA) results on both LIBERO and RoboTwin benchmarks.
    In real-world experiments involving complex spatial structures, SF improves task success rates by up to 50%.

  3. Efficient Training: SF requires only 3% of the training steps or 5% of the training data to reach a 66% success rate on LIBERO-Long. Moreover, it achieves strong real-world performance with as few as 20 demonstrations.

📃 Overview

teaser

Our Spatial-Forcing (SF) model aligns the intermediate visual embeddings of VLAs with geometric representations produced by pretrained 3D foundation models. This alignment demonstrates enhanced performance, training efficiency, and data effificency.

🚀 Get Started

🔥 TODO List

✅ Training and inference code on LIBERO (Base model: OpenVLA)
✅ Checkpoints on LIBERO (Base model: OpenVLA)
✅ Deployment code in real world (Base model: Pi_0 torch version)

🌏 Contact

For further discussion and collaboration, please feel free to contact us via Email and WeChat:

Author Email WeChat
Fuhao Li [email protected] haofuly
Wenxuan Song [email protected] swx0757

❤️ Acknowledgement

We thank these great works and open-source codebases: OpenVLA-OFT & OpenPI & VGGT & REPA

🖊 Citation

If you find this work useful, please cite:

@article{spatialforcing2025,
  author    = {Li Fuhao, Song Wenxuan, Zhao Han, Wang Jingbo, Ding Pengxiang, Wang Donglin, Zeng Long, Li Haoang},
  title     = {Spatial Forcing: Implicit Spatial Representation Alignment For Vision-Language-Action Model},
  journal   = {arXiv preprint arXiv:2510.12276},
  year      = {2025},
}

About

Official implementation of Spatial-Forcing: Implicit Spatial Representation Alignment for Vision-language-action Model

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages