Archon V1 - Basic Pydantic AI Agent to Build other Pydantic AI Agents

This is the first iteration of the Archon project - no use of LangGraph and built with a single AI agent to keep things very simple and introductory.

An intelligent documentation crawler and RAG (Retrieval-Augmented Generation) agent built using Pydantic AI and Supabase that is capable of building other Pydantic AI agents. The agent crawls the Pydantic AI documentation, stores content in a vector database, and provides Pydantic AI agent code by retrieving and analyzing relevant documentation chunks.

Features

Pydantic AI documentation crawling and chunking
Vector database storage with Supabase
Semantic search using OpenAI embeddings
RAG-based question answering
Support for code block preservation
Streamlit UI for interactive querying

Prerequisites

Python 3.11+
Supabase account and database
OpenAI API key
Streamlit (for web interface)

Installation

Clone the repository:

git clone https://github.com/coleam00/archon.git
cd archon/iterations/v1-single-agent

Install dependencies (recommended to use a Python virtual environment):

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
pip install -r requirements.txt

Set up environment variables:

Rename .env.example to .env
Edit .env with your API keys and preferences:

OPENAI_API_KEY=your_openai_api_key
SUPABASE_URL=your_supabase_url
SUPABASE_SERVICE_KEY=your_supabase_service_key
LLM_MODEL=gpt-4o-mini  # or your preferred OpenAI model

Usage

Database Setup

Execute the SQL commands in site_pages.sql to:

Create the necessary tables
Enable vector similarity search
Set up Row Level Security policies

In Supabase, do this by going to the "SQL Editor" tab and pasting in the SQL into the editor there. Then click "Run".

Crawl Documentation

To crawl and store documentation in the vector database:

python crawl_pydantic_ai_docs.py

This will:

Fetch URLs from the documentation sitemap
Crawl each page and split into chunks
Generate embeddings and store in Supabase

Streamlit Web Interface

For an interactive web interface to query the documentation:

streamlit run streamlit_ui.py

The interface will be available at http://localhost:8501

Configuration

Database Schema

The Supabase database uses the following schema:

CREATE TABLE site_pages (
    id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
    url TEXT,
    chunk_number INTEGER,
    title TEXT,
    summary TEXT,
    content TEXT,
    metadata JSONB,
    embedding VECTOR(1536)
);

Chunking Configuration

You can configure chunking parameters in crawl_pydantic_ai_docs.py:

chunk_size = 5000  # Characters per chunk

The chunker intelligently preserves:

Code blocks
Paragraph boundaries
Sentence boundaries

Project Structure

crawl_pydantic_ai_docs.py: Documentation crawler and processor
pydantic_ai_expert.py: RAG agent implementation
streamlit_ui.py: Web interface
site_pages.sql: Database setup commands
requirements.txt: Project dependencies

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Archon V1 - Basic Pydantic AI Agent to Build other Pydantic AI Agents

Features

Prerequisites

Installation

Usage

Database Setup

Crawl Documentation

Streamlit Web Interface

Configuration

Database Schema

Chunking Configuration

Project Structure

Contributing

Files

README.md

Latest commit

History

README.md

File metadata and controls

Archon V1 - Basic Pydantic AI Agent to Build other Pydantic AI Agents

Features

Prerequisites

Installation

Usage

Database Setup

Crawl Documentation

Streamlit Web Interface

Configuration

Database Schema

Chunking Configuration

Project Structure

Contributing