Skip to content

TsinghuaC3I/SSRL

Repository files navigation

SSRL: Self-Search Reinforcement Learning

Paper Github Wandb Log of SSRL Huggingface Model Collection Huggingface Dataset Collection

🎉News

  • [2025-08-18] We are honored to be featured as 🤗 HuggingFace Daily Paper #2.
  • [2025-08-15] We present SSRL (Self-Search Reinforcement Learning), an investigation for Agentic Search RL without reliance on external search engine.

📖Introduction

We investigate Reinforcement Learning (RL) on Agentic search tasks without explicit gathering information from external search engines, e.g., LLMs, web engines. Previous work leverage external search engines during training, which is expensive and time-consuming, yet introducing training instability. We introduce SSRL, a novel approach that enables RL on Agentic search tasks without the need for explicit search engines which achieves comparable performance to previous methods. Though trained totally offline, it can be seamlessly applied to online search engines, and further boost its performance.

Performance and settings of SSRL.

📊Main Results

We first show that the high upper bound of Self-Search by using structured prompt, with the LLM serving as the search engine and the policy simultaneously.

TTS.

TTS.

TTS.

After that, we experiment on SSRL to teach LLMs how to leverage self-search capabilities effectively. Our results demonstrate that SSRL consistently improves performance across a variety of tasks and models.

Furthermore, although SSRL is trained offline, it can be seamlessly applied to online search engines, further boosting its performance.

Main results of SSRL.

Main results of SSRL.

✨Getting Started

You can reproduce the results of SSRL with the following commands:

git clone https://github.com/TsinghuaC3I/SSRL
cd verl

pip install -r requirements.txt

huggingface-cli download --repo-type dataset --resume-download TsinghuaC3I/SSRL --local-dir SSRL_dataset # download the dataset

bash examples/ssrl/example.sh

To evaluate the trained model with Sim2Real generalization, you can run:

bash examples/ssrl/sim2real.sh

If you want to try entropy guided Sim2Real generalization, turn on the trainer.use_entropy flag in the sim2real.sh script.

All experiments were conducted on 8 x NVIDIA A800 80GB GPUs.

📨Contact

🎈Citation

If you find SSRL helpful, please cite us.

@misc{fan2025ssrlselfsearchreinforcementlearning,
      title={SSRL: Self-Search Reinforcement Learning}, 
      author={Yuchen Fan and Kaiyan Zhang and Heng Zhou and Yuxin Zuo and Yanxu Chen and Yu Fu and Xinwei Long and Xuekai Zhu and Che Jiang and Yuchen Zhang and Li Kang and Gang Chen and Cheng Huang and Zhizhou He and Bingning Wang and Lei Bai and Ning Ding and Bowen Zhou},
      year={2025},
      eprint={2508.10874},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2508.10874}, 
}

About

SSRL: Self-Search Reinforcement Learning

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published