SSRL: Self-Search Reinforcement Learning

🎉 News • 📖 Introduction • 📊 Main Results

✨ Getting Started • 📨 Contact • 🎈 Citation • 🌟 Star History

🎉News

[2025-08-18] We are honored to be featured as 🤗 HuggingFace Daily Paper #2.
[2025-08-15] We present SSRL (Self-Search Reinforcement Learning), an investigation for Agentic Search RL without reliance on external search engine.

📖Introduction

We investigate Reinforcement Learning (RL) on Agentic search tasks without explicit gathering information from external search engines, e.g., LLMs, web engines. Previous work leverage external search engines during training, which is expensive and time-consuming, yet introducing training instability. We introduce SSRL, a novel approach that enables RL on Agentic search tasks without the need for explicit search engines which achieves comparable performance to previous methods. Though trained totally offline, it can be seamlessly applied to online search engines, and further boost its performance.

📊Main Results

We first show that the high upper bound of Self-Search by using structured prompt, with the LLM serving as the search engine and the policy simultaneously.

After that, we experiment on SSRL to teach LLMs how to leverage self-search capabilities effectively. Our results demonstrate that SSRL consistently improves performance across a variety of tasks and models.

Furthermore, although SSRL is trained offline, it can be seamlessly applied to online search engines, further boosting its performance.

✨Getting Started

You can reproduce the results of SSRL with the following commands:

git clone https://github.com/TsinghuaC3I/SSRL
cd verl

pip install -r requirements.txt

huggingface-cli download --repo-type dataset --resume-download TsinghuaC3I/SSRL --local-dir SSRL_dataset # download the dataset

bash examples/ssrl/example.sh

To evaluate the trained model with Sim2Real generalization, you can run:

bash examples/ssrl/sim2real.sh

If you want to try entropy guided Sim2Real generalization, turn on the trainer.use_entropy flag in the sim2real.sh script.

All experiments were conducted on 8 x NVIDIA A800 80GB GPUs.

📨Contact

Kaiyan Zhang: [email protected]
Ning Ding: [email protected]

🎈Citation

If you find SSRL helpful, please cite us.

@misc{fan2025ssrlselfsearchreinforcementlearning,
      title={SSRL: Self-Search Reinforcement Learning}, 
      author={Yuchen Fan and Kaiyan Zhang and Heng Zhou and Yuxin Zuo and Yanxu Chen and Yu Fu and Xinwei Long and Xuekai Zhu and Che Jiang and Yuchen Zhang and Li Kang and Gang Chen and Cheng Huang and Zhizhou He and Bingning Wang and Lei Bai and Ning Ding and Bowen Zhou},
      year={2025},
      eprint={2508.10874},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2508.10874}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
docker		docker
examples		examples
figs		figs
llm_agent		llm_agent
recipe		recipe
scripts		scripts
tests		tests
verl		verl
.DS_Store		.DS_Store
.gitignore		.gitignore
LICENSE		LICENSE
Notice.txt		Notice.txt
README.md		README.md
pyproject.toml		pyproject.toml
requirements-npu.txt		requirements-npu.txt
requirements.txt		requirements.txt
requirements_sglang.txt		requirements_sglang.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SSRL: Self-Search Reinforcement Learning

🎉News

📖Introduction

📊Main Results

✨Getting Started

📨Contact

🎈Citation

About

Uh oh!

Releases

Packages

Contributors 2

Languages

License

TsinghuaC3I/SSRL

Folders and files

Latest commit

History

Repository files navigation

SSRL: Self-Search Reinforcement Learning

🎉News

📖Introduction

📊Main Results

✨Getting Started

📨Contact

🎈Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages