Skip to content

Latest commit

 

History

History
58 lines (39 loc) · 2.16 KB

README.md

File metadata and controls

58 lines (39 loc) · 2.16 KB

🥥 CocoIndex ETL with Document AI

CocoIndex is an ETL framework to transform data for AI, with real-time incremental processing - keep index up to date with low latency on source update. It supports custom logic like LEGO, and makes it easy for users to plugin the modules that best suits their project.

In this example, we will walk you through how to build embedding index based on local files, using Google Document AI as parser.

🥥 🌴 We are constantly improving - more blogs and examples coming soon. Stay tuned 👀 and drop a star at Cocoindex on Github for latest updates! GitHub

Use Document AI to parse PDF files in CocoIndex

Prerequisite

Run

Install dependencies:

pip install -e .

Setup:

python main.py cocoindex setup

Update index:

python main.py cocoindex update

Run:

python main.py

CocoInsight

CocoInsight is in Early Access now (Free) 😊 You found us! A quick 3 minute video tutorial about CocoInsight: Watch on YouTube.

Run CocoInsight to understand your RAG data pipeline:

python main.py cocoindex server -c https://cocoindex.io

Then open the CocoInsight UI at https://cocoindex.io/cocoinsight.