Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
85 changes: 52 additions & 33 deletions CLAUDE.md
Original file line number Diff line number Diff line change
@@ -1,102 +1,121 @@
# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
This file provides guidance to Claude Code (claude.ai/code) when working with
code in this repository.

## Commands

### Development

- `bun dev` - Start hot-reloading development server (watches TypeScript files)
- `bun start` - Start production server
- `bun test platform/test` - Run TypeScript tests
- `bun test` - Run TypeScript tests
- `bun py <service> <input.json>` - Run Python service directly via entry.py
(sets up env)

### Python Setup

- `poetry install` - Install main Python dependencies
- `poetry install --with ft` - Install with fine-tuning dependencies (includes large ML models)
- `poetry install --with dev` - Install with development dependencies

### Testing Python Services
- `pytest services/<service>/tests/` - Run tests for specific service
- `pytest` - Run all Python tests

There are no python tests in the repo at this time.

### Code Quality

- `black services/` - Format Python code (line length: 120)
- `ruff check services/` - Lint Python code with comprehensive rule set

## Architecture

This is a hybrid TypeScript/Python platform that provides AI and data services for the OpenFn toolchain.
This is a hybrid TypeScript/Python platform that provides AI and data services
for the OpenFn platform and toolchain.

### Server Structure (TypeScript + Bun + Elysia)

### Server Structure (TypeScript - Bun + Elysia)
- **Entry**: `platform/src/index.ts` → `platform/src/server.ts`
- **Framework**: Elysia web framework running on Bun runtime
- **Middleware**: Health checks, directory listing, Python service bridging
- **Services Bridge**: `/services/<name>` endpoints invoke Python modules via child processes
- **Services Bridge**: `/services/<name>` endpoints invoke Python modules via
child processes

### Python Services Architecture
- **Entry Point**: `services/entry.py` - All Python services invoked through this module
- **Service Structure**: Each service is a Python module in `services/<name>/` with a `main()` function
- **Invocation Pattern**: `services/<name>/<name>.py` with `main(data: dict) -> dict`
- **Context Isolation**: Each service call runs in its own process context

- **Entry Point**: `services/entry.py` - All Python services invoked through
this module
- **Service Structure**: Each service is a Python module in `services/<name>/`
with a `main()` function
- **Invocation Pattern**: `services/<name>/<name>.py` with
`main(data: dict) -> dict`

### Key Python Services

#### AI & Generation Services

- `job_chat/` - AI chatbot for OpenFn job assistance with RAG
- `workflow_chat/` - AI assistant for OpenFn workflow creation
- `adaptor_gen/` - AI-powered OpenFn adaptor generation
- `code_generator/` - General-purpose code generation service
- `gen_job/` - Generates OpenFn job code
- `describe_adaptor/` - Generates descriptions for OpenFn adaptors
- `signature_generator/` - Generates function signatures for adaptors
- `inference/` - ML model inference (supports multiple models)
- `vocab_mapper/` - Maps medical vocabularies (LOINC/SNOMED) using embeddings

#### Embeddings & Search Services
- `embeddings/` - Vector embeddings with Pinecone (production index: "apollo-mappings")

Pinecone is used in production to store embeddings. OpenAI is used to generate
vectors.

The OpenFn docsite docs.openfn.org is embedded and used in search_docsite, which
is utilise by job_chat to dynamically add context to user questions.

- `search_docsite/` - Searches OpenFn documentation using Pinecone vector store
- `vocab_mapper/` - Maps medical vocabularies (LOINC/SNOMED) using embeddings
- `embed_docsite/` - Indexes OpenFn documentation for search
- `embeddings/` - Vector embeddings with Pinecone (production index:
"apollo-mappings")
- `embed_loinc_dataset/` - Preprocesses and embeds LOINC medical codes
- `embed_snomed_dataset/` - Preprocesses and embeds SNOMED medical terminology

#### Utility Services

- `latest_adaptors/` - Retrieves latest adaptor versions
- `status/` - System health and status checks

### Communication Protocols

- **HTTP**: POST requests with JSON payloads
- **WebSocket**: Same URLs as HTTP, provides live log streaming
- `start` event: Client sends JSON payload
- `log` event: Server streams Python logger output (not print statements)
- `complete` event: Server sends final JSON result

### Python Environment

- **Python Version**: 3.11 (exact requirement)
- **Dependency Management**: Poetry with in-project `.venv`
- **Environment Loading**: Uses `.env` file for API keys and configuration
- **Error Handling**: Custom `ApolloError` class with Sentry integration

### Development Patterns
- TypeScript services are minimal - primarily routing and Python process management
- Python services follow common patterns but have no strict interface beyond `main(data) -> dict`
- All services expect and return JSON

- TypeScript services are minimal - primarily routing and Python process
management
- Python services follow common patterns but have no strict interface beyond
`main(data) -> dict`
- When running python services, either call a running server through curl or ust
entry.py to exercise the environment
- All services expect and return JSON through main
- Logger output (not print statements) streams to WebSocket clients
- API keys loaded from `.env` rather than embedded in payloads

## Important Notes

### Environment Requirements
- **Required API Keys**: OpenAI API key, Pinecone API key (for embedding services)

- **Required API Keys**: OpenAI API key, Pinecone API key (for embedding
services)
- **Python Environment**: Must use Python 3.11 exactly
- **Vector Store**: Production uses Pinecone index "apollo-mappings" with namespace-based collections
- **Vector Store**: Production uses Pinecone index "apollo-mappings" with
namespace-based collections

### Service Dependencies
- Embedding services use `loinc_store.connect_loinc()` and `snomed_store.connect_snomed()` for production data

- Embedding services use `loinc_store.connect_loinc()` and
`snomed_store.connect_snomed()` for production data
- Both connect to Pinecone with "apollo-mappings" index
- Medical vocab services depend on pre-embedded LOINC/SNOMED datasets

### Development Guidelines
- Use `logger` for output that should stream to WebSocket clients (not `print`)
- Service entry points must be named `<service_name>.py` with `main(data: dict) -> dict`
- Test services locally with `bun py <service> <input.json>`
- Vector store supports both Pinecone and Zilliz but production uses Pinecone only
46 changes: 23 additions & 23 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -73,15 +73,22 @@ from a node_modules. None of this affects python.

See [bun auto-install]() for more details.

## Finetuning and poetry dependency groups
## Python Setup

`poetry install` will only install the main dependencies - the stuff used in the
docker image.
This repo uses `poetry` to manage dependencies.

Dependencies for finetuning (which include huge models) are in a special
optional `ft` group in the `pyproject.toml`.
We use an "in-project" venv , which means a `.venv` folder will be created when
you run `poetry install`.

To install these, do `poetry install --with ft`
All python is invoked through `entry.py`, which loads the environment properly
so that relative imports work.

You can invoke entry.py directly (ie, without HTTP or any intermedia js) through
bun from the root:

```
bun py echo tmp/payload.json
```

## CLI

Expand Down Expand Up @@ -175,23 +182,6 @@ process.
Note that `print()` statements do not get send out to the web socket, as these
are intended for local debugging. Only logs from a logger object are diverted.

## Python Setup

This repo uses `poetry` to manage dependencies.

We use an "in-project" venv , which means a `.venv` folder will be created when
you run `poetry install`.

All python is invoked through `entry.py`, which loads the environment properly
so that relative imports work.

You can invoke entry.py directly (ie, without HTTP or any intermedia js) through
bun from the root:

```
bun py echo tmp/payload.json
```

## Docker

To build the docker image:
Expand All @@ -206,6 +196,16 @@ To run it on port 3000
docker run -p 3000:3000 openfn-apollo
```

## Finetuning and poetry dependency groups

`poetry install` will only install the main dependencies - the stuff used in the
docker image.

Dependencies for finetuning (which include huge models) are in a special
optional `ft` group in the `pyproject.toml`.

To install these, do `poetry install --with ft`

## Contributing

See the Contribution Guide for more details about how and where to contribute to
Expand Down