diff --git a/CLAUDE.md b/CLAUDE.md index 69486e3..ac1a05d 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -1,69 +1,83 @@ # CLAUDE.md -This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. +This file provides guidance to Claude Code (claude.ai/code) when working with +code in this repository. ## Commands ### Development + - `bun dev` - Start hot-reloading development server (watches TypeScript files) - `bun start` - Start production server -- `bun test platform/test` - Run TypeScript tests +- `bun test` - Run TypeScript tests - `bun py ` - Run Python service directly via entry.py + (sets up env) ### Python Setup + - `poetry install` - Install main Python dependencies -- `poetry install --with ft` - Install with fine-tuning dependencies (includes large ML models) -- `poetry install --with dev` - Install with development dependencies ### Testing Python Services -- `pytest services//tests/` - Run tests for specific service -- `pytest` - Run all Python tests + +There are no python tests in the repo at this time. ### Code Quality + - `black services/` - Format Python code (line length: 120) - `ruff check services/` - Lint Python code with comprehensive rule set ## Architecture -This is a hybrid TypeScript/Python platform that provides AI and data services for the OpenFn toolchain. +This is a hybrid TypeScript/Python platform that provides AI and data services +for the OpenFn platform and toolchain. + +### Server Structure (TypeScript + Bun + Elysia) -### Server Structure (TypeScript - Bun + Elysia) - **Entry**: `platform/src/index.ts` → `platform/src/server.ts` - **Framework**: Elysia web framework running on Bun runtime - **Middleware**: Health checks, directory listing, Python service bridging -- **Services Bridge**: `/services/` endpoints invoke Python modules via child processes +- **Services Bridge**: `/services/` endpoints invoke Python modules via + child processes ### Python Services Architecture -- **Entry Point**: `services/entry.py` - All Python services invoked through this module -- **Service Structure**: Each service is a Python module in `services//` with a `main()` function -- **Invocation Pattern**: `services//.py` with `main(data: dict) -> dict` -- **Context Isolation**: Each service call runs in its own process context + +- **Entry Point**: `services/entry.py` - All Python services invoked through + this module +- **Service Structure**: Each service is a Python module in `services//` + with a `main()` function +- **Invocation Pattern**: `services//.py` with + `main(data: dict) -> dict` ### Key Python Services #### AI & Generation Services + - `job_chat/` - AI chatbot for OpenFn job assistance with RAG - `workflow_chat/` - AI assistant for OpenFn workflow creation -- `adaptor_gen/` - AI-powered OpenFn adaptor generation -- `code_generator/` - General-purpose code generation service -- `gen_job/` - Generates OpenFn job code -- `describe_adaptor/` - Generates descriptions for OpenFn adaptors -- `signature_generator/` - Generates function signatures for adaptors -- `inference/` - ML model inference (supports multiple models) +- `vocab_mapper/` - Maps medical vocabularies (LOINC/SNOMED) using embeddings #### Embeddings & Search Services -- `embeddings/` - Vector embeddings with Pinecone (production index: "apollo-mappings") + +Pinecone is used in production to store embeddings. OpenAI is used to generate +vectors. + +The OpenFn docsite docs.openfn.org is embedded and used in search_docsite, which +is utilise by job_chat to dynamically add context to user questions. + - `search_docsite/` - Searches OpenFn documentation using Pinecone vector store -- `vocab_mapper/` - Maps medical vocabularies (LOINC/SNOMED) using embeddings - `embed_docsite/` - Indexes OpenFn documentation for search +- `embeddings/` - Vector embeddings with Pinecone (production index: + "apollo-mappings") - `embed_loinc_dataset/` - Preprocesses and embeds LOINC medical codes - `embed_snomed_dataset/` - Preprocesses and embeds SNOMED medical terminology #### Utility Services + - `latest_adaptors/` - Retrieves latest adaptor versions - `status/` - System health and status checks ### Communication Protocols + - **HTTP**: POST requests with JSON payloads - **WebSocket**: Same URLs as HTTP, provides live log streaming - `start` event: Client sends JSON payload @@ -71,32 +85,37 @@ This is a hybrid TypeScript/Python platform that provides AI and data services f - `complete` event: Server sends final JSON result ### Python Environment + - **Python Version**: 3.11 (exact requirement) - **Dependency Management**: Poetry with in-project `.venv` - **Environment Loading**: Uses `.env` file for API keys and configuration - **Error Handling**: Custom `ApolloError` class with Sentry integration ### Development Patterns -- TypeScript services are minimal - primarily routing and Python process management -- Python services follow common patterns but have no strict interface beyond `main(data) -> dict` -- All services expect and return JSON + +- TypeScript services are minimal - primarily routing and Python process + management +- Python services follow common patterns but have no strict interface beyond + `main(data) -> dict` +- When running python services, either call a running server through curl or ust + entry.py to exercise the environment +- All services expect and return JSON through main - Logger output (not print statements) streams to WebSocket clients - API keys loaded from `.env` rather than embedded in payloads ## Important Notes ### Environment Requirements -- **Required API Keys**: OpenAI API key, Pinecone API key (for embedding services) + +- **Required API Keys**: OpenAI API key, Pinecone API key (for embedding + services) - **Python Environment**: Must use Python 3.11 exactly -- **Vector Store**: Production uses Pinecone index "apollo-mappings" with namespace-based collections +- **Vector Store**: Production uses Pinecone index "apollo-mappings" with + namespace-based collections ### Service Dependencies -- Embedding services use `loinc_store.connect_loinc()` and `snomed_store.connect_snomed()` for production data + +- Embedding services use `loinc_store.connect_loinc()` and + `snomed_store.connect_snomed()` for production data - Both connect to Pinecone with "apollo-mappings" index - Medical vocab services depend on pre-embedded LOINC/SNOMED datasets - -### Development Guidelines -- Use `logger` for output that should stream to WebSocket clients (not `print`) -- Service entry points must be named `.py` with `main(data: dict) -> dict` -- Test services locally with `bun py ` -- Vector store supports both Pinecone and Zilliz but production uses Pinecone only \ No newline at end of file diff --git a/README.md b/README.md index 8f7b841..74c5f03 100644 --- a/README.md +++ b/README.md @@ -73,15 +73,22 @@ from a node_modules. None of this affects python. See [bun auto-install]() for more details. -## Finetuning and poetry dependency groups +## Python Setup -`poetry install` will only install the main dependencies - the stuff used in the -docker image. +This repo uses `poetry` to manage dependencies. -Dependencies for finetuning (which include huge models) are in a special -optional `ft` group in the `pyproject.toml`. +We use an "in-project" venv , which means a `.venv` folder will be created when +you run `poetry install`. -To install these, do `poetry install --with ft` +All python is invoked through `entry.py`, which loads the environment properly +so that relative imports work. + +You can invoke entry.py directly (ie, without HTTP or any intermedia js) through +bun from the root: + +``` +bun py echo tmp/payload.json +``` ## CLI @@ -175,23 +182,6 @@ process. Note that `print()` statements do not get send out to the web socket, as these are intended for local debugging. Only logs from a logger object are diverted. -## Python Setup - -This repo uses `poetry` to manage dependencies. - -We use an "in-project" venv , which means a `.venv` folder will be created when -you run `poetry install`. - -All python is invoked through `entry.py`, which loads the environment properly -so that relative imports work. - -You can invoke entry.py directly (ie, without HTTP or any intermedia js) through -bun from the root: - -``` -bun py echo tmp/payload.json -``` - ## Docker To build the docker image: @@ -206,6 +196,16 @@ To run it on port 3000 docker run -p 3000:3000 openfn-apollo ``` +## Finetuning and poetry dependency groups + +`poetry install` will only install the main dependencies - the stuff used in the +docker image. + +Dependencies for finetuning (which include huge models) are in a special +optional `ft` group in the `pyproject.toml`. + +To install these, do `poetry install --with ft` + ## Contributing See the Contribution Guide for more details about how and where to contribute to