🔍📚 Informatica: Open and Scalable Foundations for Deep Research System

Overview | News | Roadmap | Demo | Misc

🔆 Overview

Informatica is a comprehensive collection of systematic research projects focused on deep research systems. Our mission is to provide open-source, scalable frameworks, datasets, data synthesis methods, models, and demonstrations for the deep research community.

We are committed to advancing the field of deep research through multi-dimensional investigations, including:

Scalable Data Synthesis: Advanced frameworks for generating high-quality, complexity-controllable research datasets
Deep Research Models: State-of-the-art models trained on structured research tasks
Open Datasets: Publicly available datasets designed for training and evaluating deep research capabilities
Research Tools: Complete toolchains for constructing, training, and deploying deep research systems
Interactive Demonstrations: User-friendly demos showcasing the capabilities of our research systems

Our team continuously explores various aspects of deep research problems, from fundamental question decomposition and reasoning to practical applications in knowledge discovery and information synthesis. Through Informatica, we aim to democratize access to deep research technologies and foster innovation in the broader research community.

📰 News

[2025/09/19]🎉 Our paper InForage has been accepted by NeurIPS 2025 as a Spotlight paper! Codes will be released here soon.

[2025/09/17]🔥 We have released a large-scale dataset for deep research tasks, named InfoSeek.

[2025/05/14]🔥 We have released our initial research on agentic search, named InForage.

🗺️ Roadmap

Initial Research

Technical Report: InForage - Agentic Search Framework
NeurIPS 2025 Spotlight Paper Acceptance

Open and Scalable Data Synthesis

Open Dataset: InfoSeek
Data Construction Pipeline
Scalable Synthesis Framework
Quality Control Mechanisms

Model Development

SFT Training Code
RL Training Code
InfoSeeker Model Release
Model Evaluation Framework

Applications

Knowledge Discovery Tools
Information Synthesis Systems
Research Assistant Applications

Demo and Deployment

Interactive Demo Platform
API Integration
User Interface Development

🎯 Demo

We are building a demo page to showcase different agentic search methods and allow hands-on exploration of their capabilities. Each demo will be integrated into a standardized retrieval and web browser interface with comparable settings, enabling comprehensive and fair comparisons across various approaches. This systematic evaluation will help identify strengths and limitations of different methods and advance the state-of-the-art in agentic search.

🌟 Misc

📄 Citation

InfoSeek:

@misc{xia2025opendatasynthesisdeep,
      title={Open Data Synthesis For Deep Research}, 
      author={Ziyi Xia and Kun Luo and Hongjin Qian and Zheng Liu},
      year={2025},
      url={https://arxiv.org/abs/2509.00375}, 
}

InForage:

@misc{qian2025scentknowledgeoptimizingsearchenhanced,
      title={Scent of Knowledge: Optimizing Search-Enhanced Reasoning with Information Foraging}, 
      author={Hongjin Qian and Zheng Liu},
      year={2025},
      url={https://arxiv.org/abs/2505.09316}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
assets		assets
research		research
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🔍📚 Informatica: Open and Scalable Foundations for Deep Research System

Overview | News | Roadmap | Demo | Misc

🔆 Overview

📰 News

🗺️ Roadmap

Initial Research

Open and Scalable Data Synthesis

Model Development

Applications

Demo and Deployment

🎯 Demo

🌟 Misc

📄 Citation

About

Uh oh!

Releases

Packages

Contributors 3

Languages

License

VectorSpaceLab/Infomatica

Folders and files

Latest commit

History

Repository files navigation

🔍📚 Informatica: Open and Scalable Foundations for Deep Research System

Overview | News | Roadmap | Demo | Misc

🔆 Overview

📰 News

🗺️ Roadmap

Initial Research

Open and Scalable Data Synthesis

Model Development

Applications

Demo and Deployment

🎯 Demo

🌟 Misc

📄 Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages