Skip to content

VectorSpaceLab/Infomatica

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 

Repository files navigation

🔍📚 Informatica: Open and Scalable Foundations for Deep Research System

Build Build License Static Badge

🔆 Overview

Informatica is a comprehensive collection of systematic research projects focused on deep research systems. Our mission is to provide open-source, scalable frameworks, datasets, data synthesis methods, models, and demonstrations for the deep research community.

We are committed to advancing the field of deep research through multi-dimensional investigations, including:

  • Scalable Data Synthesis: Advanced frameworks for generating high-quality, complexity-controllable research datasets
  • Deep Research Models: State-of-the-art models trained on structured research tasks
  • Open Datasets: Publicly available datasets designed for training and evaluating deep research capabilities
  • Research Tools: Complete toolchains for constructing, training, and deploying deep research systems
  • Interactive Demonstrations: User-friendly demos showcasing the capabilities of our research systems

Our team continuously explores various aspects of deep research problems, from fundamental question decomposition and reasoning to practical applications in knowledge discovery and information synthesis. Through Informatica, we aim to democratize access to deep research technologies and foster innovation in the broader research community.

📰 News

[2025/09/19]🎉 Our paper InForage has been accepted by NeurIPS 2025 as a Spotlight paper! Codes will be released here soon.

[2025/09/17]🔥 We have released a large-scale dataset for deep research tasks, named InfoSeek.

[2025/05/14]🔥 We have released our initial research on agentic search, named InForage.

🗺️ Roadmap

Initial Research

  • Technical Report: InForage - Agentic Search Framework
  • NeurIPS 2025 Spotlight Paper Acceptance

Open and Scalable Data Synthesis

  • Open Dataset: InfoSeek
  • Data Construction Pipeline
  • Scalable Synthesis Framework
  • Quality Control Mechanisms

Model Development

  • SFT Training Code
  • RL Training Code
  • InfoSeeker Model Release
  • Model Evaluation Framework

Applications

  • Knowledge Discovery Tools
  • Information Synthesis Systems
  • Research Assistant Applications

Demo and Deployment

  • Interactive Demo Platform
  • API Integration
  • User Interface Development

🎯 Demo

We are building a demo page to showcase different agentic search methods and allow hands-on exploration of their capabilities. Each demo will be integrated into a standardized retrieval and web browser interface with comparable settings, enabling comprehensive and fair comparisons across various approaches. This systematic evaluation will help identify strengths and limitations of different methods and advance the state-of-the-art in agentic search.

🌟 Misc

Star History Chart

📄 Citation

InfoSeek:

@misc{xia2025opendatasynthesisdeep,
      title={Open Data Synthesis For Deep Research}, 
      author={Ziyi Xia and Kun Luo and Hongjin Qian and Zheng Liu},
      year={2025},
      url={https://arxiv.org/abs/2509.00375}, 
}

InForage:

@misc{qian2025scentknowledgeoptimizingsearchenhanced,
      title={Scent of Knowledge: Optimizing Search-Enhanced Reasoning with Information Foraging}, 
      author={Hongjin Qian and Zheng Liu},
      year={2025},
      url={https://arxiv.org/abs/2505.09316}, 
}

About

Data Synthesis for Deep Research Based on Semi-Structured Data

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published