Shuai Yang*, Hao Li*, Yilun Chen, Bin Wang, Yang Tian, Tai Wang,
Hanqing Wang, Feng Zhao, Yiyi Liao, Jiangmiao Pang
* Equal Contributions
University of Science and Technology of China, Zhejiang University,
Shanghai Artificial Intelligence Laboratory
- We propose InstructVLA, a VLA architecture and training pipeline that emphasizes the importance of language capability in VLAs by efficiently preserving pretrained vision-language knowledge from VLMs while integrating manipulation as a component of instruction following.
- We design a practical data and evaluation pipeline for vision-language-action instruction following, supported by 650K tailored VLA-IT annotations and a manually curated benchmark suite, enabling evaluation of VLAs' instruction generalization capabilities.
- InstructVLA achieves leading performance across robotic manipulation tasks, multimodal benchmarks, and real-world deployments, enabling intuitive and controllable human-robot interaction.
- Release the VLA-IT dataset.
- Release the SimplerEnv-Instruct.
- Release the checkpoints and training code for post-training and finetuning.
- More powerful InstructVLA v2.0.
If you find our work helpful, please cite:
@misc{yang2025instructvlavisionlanguageactioninstructiontuning,
title={InstructVLA: Vision-Language-Action Instruction Tuning from Understanding to Manipulation},
author={Shuai Yang and Hao Li and Yilun Chen and Bin Wang and Yang Tian and Tai Wang and Hanqing Wang and Feng Zhao and Yiyi Liao and Jiangmiao Pang},
year={2025},
eprint={2507.17520},
archivePrefix={arXiv},
primaryClass={cs.RO},
url={https://arxiv.org/abs/2507.17520},
}
@misc{li2025cronusvlatransferringlatentmotion,
title={CronusVLA: Transferring Latent Motion Across Time for Multi-Frame Prediction in Manipulation},
author={Hao Li and Shuai Yang and Yilun Chen and Yang Tian and Xiaoda Yang and Xinyi Chen and Hanqing Wang and Tai Wang and Feng Zhao and Dahua Lin and Jiangmiao Pang},
year={2025},
eprint={2506.19816},
archivePrefix={arXiv},
primaryClass={cs.RO},
url={https://arxiv.org/abs/2506.19816},
}