CocoIndex is an ETL framework to transform data for AI, with real-time incremental processing - keep index up to date with low latency on source update. It supports custom logic like LEGO, and makes it easy for users to plugin the modules that best suits their project.
In this example, we will walk you through how to build embedding index based on local files, using Google Document AI as parser.
🥥 🌴 We are constantly improving - more blogs and examples coming soon. Stay tuned 👀 and drop a star at Cocoindex on Github for latest updates!
- Install Postgres if you don't have one.
- Configure Project and Processs ID for Document AI API
- Official Google document AI API
- Sign in to Google Cloud Console, create or open a project, and enable Document AI API.
 
- Create a processor in Document AI.
 
 
- update '.env' with GOOGLE_CLOUD_PROJECT_IDandGOOGLE_CLOUD_PROCESSOR_ID.
Install dependencies:
pip install -e .Setup:
cocoindex setup main.pyUpdate index:
cocoindex update main.pyRun:
cocoindex server -ci mainCocoInsight is in Early Access now (Free) 😊 You found us! A quick 3 minute video tutorial about CocoInsight: Watch on YouTube.
Run CocoInsight to understand your RAG data pipeline:
python main.py cocoindex server -c https://cocoindex.io
Then open the CocoInsight UI at https://cocoindex.io/cocoinsight.
