OpenFn · josephjclark · Oct 13, 2025 · Oct 9, 2025
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -1,102 +1,121 @@
 # CLAUDE.md
 
-This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
+This file provides guidance to Claude Code (claude.ai/code) when working with
+code in this repository.
 
 ## Commands
 
 ### Development
+
 - `bun dev` - Start hot-reloading development server (watches TypeScript files)
 - `bun start` - Start production server
-- `bun test platform/test` - Run TypeScript tests
+- `bun test` - Run TypeScript tests
 - `bun py <service> <input.json>` - Run Python service directly via entry.py
+  (sets up env)
 
 ### Python Setup
+
 - `poetry install` - Install main Python dependencies
-- `poetry install --with ft` - Install with fine-tuning dependencies (includes large ML models)
-- `poetry install --with dev` - Install with development dependencies
 
 ### Testing Python Services
-- `pytest services/<service>/tests/` - Run tests for specific service
-- `pytest` - Run all Python tests
+
+There are no python tests in the repo at this time.
 
 ### Code Quality
+
 - `black services/` - Format Python code (line length: 120)
 - `ruff check services/` - Lint Python code with comprehensive rule set
 
 ## Architecture
 
-This is a hybrid TypeScript/Python platform that provides AI and data services for the OpenFn toolchain.
+This is a hybrid TypeScript/Python platform that provides AI and data services
+for the OpenFn platform and toolchain.
+
+### Server Structure (TypeScript + Bun + Elysia)
 
-### Server Structure (TypeScript - Bun + Elysia)
 - **Entry**: `platform/src/index.ts` → `platform/src/server.ts`
 - **Framework**: Elysia web framework running on Bun runtime
 - **Middleware**: Health checks, directory listing, Python service bridging
-- **Services Bridge**: `/services/<name>` endpoints invoke Python modules via child processes
+- **Services Bridge**: `/services/<name>` endpoints invoke Python modules via
+  child processes
 
 ### Python Services Architecture
-- **Entry Point**: `services/entry.py` - All Python services invoked through this module
-- **Service Structure**: Each service is a Python module in `services/<name>/` with a `main()` function
-- **Invocation Pattern**: `services/<name>/<name>.py` with `main(data: dict) -> dict`
-- **Context Isolation**: Each service call runs in its own process context
+
+- **Entry Point**: `services/entry.py` - All Python services invoked through
+  this module
+- **Service Structure**: Each service is a Python module in `services/<name>/`
+  with a `main()` function
+- **Invocation Pattern**: `services/<name>/<name>.py` with
+  `main(data: dict) -> dict`
 
 ### Key Python Services
 
 #### AI & Generation Services
+
 - `job_chat/` - AI chatbot for OpenFn job assistance with RAG
 - `workflow_chat/` - AI assistant for OpenFn workflow creation
-- `adaptor_gen/` - AI-powered OpenFn adaptor generation
-- `code_generator/` - General-purpose code generation service
-- `gen_job/` - Generates OpenFn job code
-- `describe_adaptor/` - Generates descriptions for OpenFn adaptors
-- `signature_generator/` - Generates function signatures for adaptors
-- `inference/` - ML model inference (supports multiple models)
+- `vocab_mapper/` - Maps medical vocabularies (LOINC/SNOMED) using embeddings
 
 #### Embeddings & Search Services
-- `embeddings/` - Vector embeddings with Pinecone (production index: "apollo-mappings")
+
+Pinecone is used in production to store embeddings. OpenAI is used to generate
+vectors.
+
+The OpenFn docsite docs.openfn.org is embedded and used in search_docsite, which
+is utilise by job_chat to dynamically add context to user questions.
+
 - `search_docsite/` - Searches OpenFn documentation using Pinecone vector store
-- `vocab_mapper/` - Maps medical vocabularies (LOINC/SNOMED) using embeddings
 - `embed_docsite/` - Indexes OpenFn documentation for search
+- `embeddings/` - Vector embeddings with Pinecone (production index:
+  "apollo-mappings")
 - `embed_loinc_dataset/` - Preprocesses and embeds LOINC medical codes
 - `embed_snomed_dataset/` - Preprocesses and embeds SNOMED medical terminology
 
 #### Utility Services
+
 - `latest_adaptors/` - Retrieves latest adaptor versions
 - `status/` - System health and status checks
 
 ### Communication Protocols
+
 - **HTTP**: POST requests with JSON payloads
 - **WebSocket**: Same URLs as HTTP, provides live log streaming
   - `start` event: Client sends JSON payload
   - `log` event: Server streams Python logger output (not print statements)
   - `complete` event: Server sends final JSON result
 
 ### Python Environment
+
 - **Python Version**: 3.11 (exact requirement)
 - **Dependency Management**: Poetry with in-project `.venv`
 - **Environment Loading**: Uses `.env` file for API keys and configuration
 - **Error Handling**: Custom `ApolloError` class with Sentry integration
 
 ### Development Patterns
-- TypeScript services are minimal - primarily routing and Python process management
-- Python services follow common patterns but have no strict interface beyond `main(data) -> dict`
-- All services expect and return JSON
+
+- TypeScript services are minimal - primarily routing and Python process
+  management
+- Python services follow common patterns but have no strict interface beyond
+  `main(data) -> dict`
+- When running python services, either call a running server through curl or ust
+  entry.py to exercise the environment
+- All services expect and return JSON through main
 - Logger output (not print statements) streams to WebSocket clients
 - API keys loaded from `.env` rather than embedded in payloads
 
 ## Important Notes
 
 ### Environment Requirements
-- **Required API Keys**: OpenAI API key, Pinecone API key (for embedding services)
+
+- **Required API Keys**: OpenAI API key, Pinecone API key (for embedding
+  services)
 - **Python Environment**: Must use Python 3.11 exactly
-- **Vector Store**: Production uses Pinecone index "apollo-mappings" with namespace-based collections
+- **Vector Store**: Production uses Pinecone index "apollo-mappings" with
+  namespace-based collections
 
 ### Service Dependencies
-- Embedding services use `loinc_store.connect_loinc()` and `snomed_store.connect_snomed()` for production data
+
+- Embedding services use `loinc_store.connect_loinc()` and
+  `snomed_store.connect_snomed()` for production data
 - Both connect to Pinecone with "apollo-mappings" index
 - Medical vocab services depend on pre-embedded LOINC/SNOMED datasets
-
-### Development Guidelines
-- Use `logger` for output that should stream to WebSocket clients (not `print`)
-- Service entry points must be named `<service_name>.py` with `main(data: dict) -> dict`
-- Test services locally with `bun py <service> <input.json>`
-- Vector store supports both Pinecone and Zilliz but production uses Pinecone only
diff --git a/README.md b/README.md
@@ -73,15 +73,22 @@ from a node_modules. None of this affects python.
 
 See [bun auto-install]() for more details.
 
-## Finetuning and poetry dependency groups
+## Python Setup
 
-`poetry install` will only install the main dependencies - the stuff used in the
-docker image.
+This repo uses `poetry` to manage dependencies.
 
-Dependencies for finetuning (which include huge models) are in a special
-optional `ft` group in the `pyproject.toml`.
+We use an "in-project" venv , which means a `.venv` folder will be created when
+you run `poetry install`.
 
-To install these, do `poetry install --with ft`
+All python is invoked through `entry.py`, which loads the environment properly
+so that relative imports work.
+
+You can invoke entry.py directly (ie, without HTTP or any intermedia js) through
+bun from the root:
+
+```
+bun py echo tmp/payload.json
+```
 
 ## CLI
 
@@ -175,23 +182,6 @@ process.
 Note that `print()` statements do not get send out to the web socket, as these
 are intended for local debugging. Only logs from a logger object are diverted.
 
-## Python Setup
-
-This repo uses `poetry` to manage dependencies.
-
-We use an "in-project" venv , which means a `.venv` folder will be created when
-you run `poetry install`.
-
-All python is invoked through `entry.py`, which loads the environment properly
-so that relative imports work.
-
-You can invoke entry.py directly (ie, without HTTP or any intermedia js) through
-bun from the root:
-
-```
-bun py echo tmp/payload.json
-```
-
 ## Docker
 
 To build the docker image:
@@ -206,6 +196,16 @@ To run it on port 3000
 docker run -p 3000:3000 openfn-apollo
 ```
 
+## Finetuning and poetry dependency groups
+
+`poetry install` will only install the main dependencies - the stuff used in the
+docker image.
+
+Dependencies for finetuning (which include huge models) are in a special
+optional `ft` group in the `pyproject.toml`.
+
+To install these, do `poetry install --with ft`
+
 ## Contributing
 
 See the Contribution Guide for more details about how and where to contribute to