VLA-RFT: Vision-Language-Action Models with Reinforcement Fine-Tuning

Vision-Language-Action (VLA) models enable embodied decision-making but rely heavily on imitation learning, leading to compounding errors and poor robustness under distribution shift. Reinforcement learning (RL) can mitigate these issues yet typically demands costly real-world interactions or suffers from sim-to-real gaps. We introduce VLA-RFT, a reinforcement fine-tuning framework that leverages a data-driven world model as a controllable simulator. Trained from real interaction data, the simulator predicts future visual observations conditioned on actions, allowing policy rollouts with dense, trajectory-level rewards derived from goal-achieving references. This design delivers an efficient and action-aligned learning signal, drastically lowering sample requirements. With fewer than 400 fine-tuning steps, VLA-RFT surpasses strong supervised baselines and achieves greater efficiency than simulator-based RL. Moreover, it exhibits strong robustness under perturbed conditions, sustaining stable task execution. Our results establish world-model-based RFT as a practical post-training paradigm to enhance the generalization and robustness of VLA models.

🚀 Quick Start

Prerequisites

Python 3.10+
CUDA 12.2+
PyTorch 2.4+
UV package manager

Clone the repository

# Clone the repository
git clone https://github.com/OpenHelix-Team/VLA-RFT.git
cd VLA-RFT

Installation(If your network is unrestricted)

# 1) Set up the environment
git submodule update --init --recursive
uv venv --seed -p 3.10
source .venv/bin/activate

# 2) Install dependencies
uv pip install -e train/verl/".[gpu]"
uv pip install 'https://github.com/Dao-AILab/flash-attention/releases/download/v2.6.0.post1/flash_attn-2.6.0.post1+cu122torch2.4cxx11abiFALSE-cp310-cp310-linux_x86_64.whl'
uv pip install -e train/verl/".[vllm]"
uv pip install -r train/verl/requirements.txt

# 3) Install vla-adapter
uv pip install git+https://github.com/moojink/dlimp_openvla.git
uv pip install -e train/verl/vla-adapter/openvla-oft

# 4) Install LIBERO requirements
uv pip install -e third_party/LIBERO

Installation(If your network is restricted)

Please refer to the instructions at third_party/README.md.

Data Preparation

Please refer to the instructions at data/README.md.

Basic Usage

LIBERO Evaluation Example

# Run evaluation with LIBERO tasks
cd scripts/libero
bash eval_libero.sh

When using LIBERO, you may get an error message like AttributeError: 'NoneType' object has no attribute 'eglQueryString'. You can use:

sudo apt-get update
sudo apt-get install libgl1-mesa-dev libegl1-mesa-dev libgles2-mesa-dev libglew-dev

Training Example

# Run training with LIBERO dataset
cd scripts/libero
bash post_train_rlvr.sh

📊 Supported Tasks & Benchmarks

LIBERO Benchmark

LIBERO-Spatial: Spatial reasoning tasks
LIBERO-Object: Object manipulation tasks
LIBERO-Goal: Goal-conditioned tasks
LIBERO-10: 10-task suite

📈 Performance

With fewer than 400 fine-tuning steps, VLA-RFT surpasses strong supervised baselines and achieves greater efficiency than simulator-based RL.

Please refer to our paper for detailed benchmark results.

📝 TODO

Init codebase
Release pre-trained and rft VLA(policy) weights
Release pre-trained World Model weights
Support real-world deployment

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

📚 Citation

If you use VLA-RFT in your research, please cite:

@article{wang2025vlaadapter,
  author={Wang, Yihao and Ding, Pengxiang and Li, Lingxiao and Cui, Can and Ge, Zirui and Tong, Xinyang and Song, Wenxuan and Zhao, Han and Zhao, Wei and Hou, Pengxu and Huang, Siteng and Tang, Yifan and Wang, Wenhui and Zhang, Ru and Liu, Jianyi and Wang, Donglin},
  title={VLA-Adapter: An Effective Paradigm for Tiny-Scale Vision-Language-Action Model},
  journal={arXiv preprint arXiv:2509.09372},
  year={2025}
}

@article{li2025vla,
  title={VLA-RFT: Vision-Language-Action Reinforcement Fine-tuning with Verified Rewards in World Simulators},
  author={Li, Hengtao and Ding, Pengxiang and Suo, Runze and Wang, Yihao and Ge, Zirui and Zang, Dongyuan and Yu, Kexian and Sun, Mingyang and Zhang, Hongyin and Wang, Donglin and others},
  journal={arXiv preprint arXiv:2510.00406},
  year={2025}
}

🙏 Acknowledgments

This work builds upon several excellent open-source projects:

VLA-Adapter: Foundation vision-language-action adapter model
VERL: Volcano Engine Reinforcement Learning framework
LIBERO: Lifelong robot learning benchmark
RLVR-world: Training world model with verified reward

⭐ Star this repository if you find it helpful!

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
checkpoints		checkpoints
data		data
image		image
scripts/libero		scripts/libero
third_party		third_party
train/verl		train/verl
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

VLA-RFT: Vision-Language-Action Models with Reinforcement Fine-Tuning

🚀 Quick Start

Prerequisites

Clone the repository

Installation(If your network is unrestricted)

Installation(If your network is restricted)

Data Preparation

Basic Usage

LIBERO Evaluation Example

Training Example

📊 Supported Tasks & Benchmarks

LIBERO Benchmark

📈 Performance

📝 TODO

📄 License

📚 Citation

🙏 Acknowledgments

About

Uh oh!

Releases

Packages

Languages

License

OpenHelix-Team/VLA-RFT

Folders and files

Latest commit

History

Repository files navigation

VLA-RFT: Vision-Language-Action Models with Reinforcement Fine-Tuning

🚀 Quick Start

Prerequisites

Clone the repository

Installation(If your network is unrestricted)

Installation(If your network is restricted)

Data Preparation

Basic Usage

LIBERO Evaluation Example

Training Example

📊 Supported Tasks & Benchmarks

LIBERO Benchmark

📈 Performance

📝 TODO

📄 License

📚 Citation

🙏 Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages