🌐 Explore to Evolve: Scaling Evolved Aggregation Logic via Proactive Online Exploration for Deep Research Agents
-
Explore to Evolve aims to generate diverse, high-quality training data for web agent foundation models, enhancing their capabilities in multi-tool usage, information seeking, and information aggregation.
-
WebAggregator, the finetuned model on WebAggregatorQA, demonstrates strong performance on GAIA-text and the WebAggregatorQA test set.
- 🤖 Fully Automated and Verifiable QA Construction
- 😄 Open Source: Complete codebase including QA construction engine, queries, trajectories, and models.
- 👍 Highly Customizable: Collect data tailored to your needs with minimal human effort, and easily customize your own agent!
Follow these steps to get started:
git clone https://github.com/Tencent/WebAggregator-
This project builds upon smolagents’ “open deep research” example 👉 smolagents open_deep_research dependencies. Thanks for their great work and please cite them!
-
Install this project’s requirements:
pip install -r requirements.txt- Please note: the implementation must utilize the
./smolagents, which provides the added functionality for trajectory collection by us. Or you can directly replace the smolagets/agents.py in your original library.
Set the configuration in the following files:
-
./config.py: Contains settings for your agent's foundation LLM, the LLMs for specific tools, and dataset paths. ./model_list.py: This file is used to implement the method for calling your foundation models (e.g., via vLLM, LiteLLM, or Azure). It calls the models that are configured in./config.py. We provide an example implementation. For more details, please refer to the smolagents repository.
The function of others:
./web_tools.py: Tools for agent. You could modify it to suit your needs../run_agent.py: The implemented agent../run: Scripts for running the agent../data: Input data for QA construction (URLs), evaluation (Benchmarks) and traj sampling (QAs).
Note: Before running any scripts, ensure all paths, model checkpoints, and other necessary parameters are properly set in the source files.
To evaluate your agent, serve your tuned checkpoint and update the corresponding settings in config.py. Make sure the correct model_id is set in the evaluation script test.sh, then run:
bash run/test.shThis command evaluates your specified model and benchmark. After evaluation, it uses LLM-as-judge to assess performance and prints the accuracy.
Start building automatic web agent data:
-
Download our collected URLs 👉 URLs or gather URLs related to your domains of interest!
-
Then, run the following command to collect the data.
bash run/QA_building.shTraining trajectories for fine-tuning your agent foundation models are available at 👉 WebAggregatorQA. Sample data can be found in ./data/train-samples for initial testing purposes.
bash run/traj_sampling.sh- Deep Research Agent framework: Cognitive Kernel-Pro
- Agent Self-Evolving Research, including WebEvolver, WebCoT, WebVoyager, OpenWebVoyager.
@misc{wang2025exploreevolvescalingevolved,
title={Explore to Evolve: Scaling Evolved Aggregation Logic via Proactive Online Exploration for Deep Research Agents},
author={Rui Wang and Ce Zhang and Jun-Yu Ma and Jianshu Zhang and Hongru Wang and Yi Chen and Boyang Xue and Tianqing Fang and Zhisong Zhang and Hongming Zhang and Haitao Mi and Dong Yu and Kam-Fai Wong},
year={2025},
eprint={2510.14438},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2510.14438},
}
@misc{fang2025cognitivekernelpro,
title={Cognitive Kernel-Pro: A Framework for Deep Research Agents and Agent Foundation Models Training},
author={Tianqing Fang and Zhisong Zhang and Xiaoyang Wang and Rui Wang and Can Qin and Yuxuan Wan and Jun-Yu Ma and Ce Zhang and Jiaqi Chen and Xiyun Li and Hongming Zhang and Haitao Mi and Dong Yu},
year={2025},
eprint={2508.00414},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2508.00414},
}