Informatica is a comprehensive collection of systematic research projects focused on deep research systems. Our mission is to provide open-source, scalable frameworks, datasets, data synthesis methods, models, and demonstrations for the deep research community.
We are committed to advancing the field of deep research through multi-dimensional investigations, including:
- Scalable Data Synthesis: Advanced frameworks for generating high-quality, complexity-controllable research datasets
- Deep Research Models: State-of-the-art models trained on structured research tasks
- Open Datasets: Publicly available datasets designed for training and evaluating deep research capabilities
- Research Tools: Complete toolchains for constructing, training, and deploying deep research systems
- Interactive Demonstrations: User-friendly demos showcasing the capabilities of our research systems
Our team continuously explores various aspects of deep research problems, from fundamental question decomposition and reasoning to practical applications in knowledge discovery and information synthesis. Through Informatica, we aim to democratize access to deep research technologies and foster innovation in the broader research community.
[2025/09/19]🎉 Our paper InForage has been accepted by NeurIPS 2025 as a Spotlight paper! Codes will be released here soon.
[2025/09/17]🔥 We have released a large-scale dataset for deep research tasks, named InfoSeek.
[2025/05/14]🔥 We have released our initial research on agentic search, named InForage.
- Technical Report: InForage - Agentic Search Framework
- NeurIPS 2025 Spotlight Paper Acceptance
- Open Dataset: InfoSeek
- Data Construction Pipeline
- Scalable Synthesis Framework
- Quality Control Mechanisms
- SFT Training Code
- RL Training Code
- InfoSeeker Model Release
- Model Evaluation Framework
- Knowledge Discovery Tools
- Information Synthesis Systems
- Research Assistant Applications
- Interactive Demo Platform
- API Integration
- User Interface Development
We are building a demo page to showcase different agentic search methods and allow hands-on exploration of their capabilities. Each demo will be integrated into a standardized retrieval and web browser interface with comparable settings, enabling comprehensive and fair comparisons across various approaches. This systematic evaluation will help identify strengths and limitations of different methods and advance the state-of-the-art in agentic search.
InfoSeek:
@misc{xia2025opendatasynthesisdeep,
title={Open Data Synthesis For Deep Research},
author={Ziyi Xia and Kun Luo and Hongjin Qian and Zheng Liu},
year={2025},
url={https://arxiv.org/abs/2509.00375},
}
InForage:
@misc{qian2025scentknowledgeoptimizingsearchenhanced,
title={Scent of Knowledge: Optimizing Search-Enhanced Reasoning with Information Foraging},
author={Hongjin Qian and Zheng Liu},
year={2025},
url={https://arxiv.org/abs/2505.09316},
}