Skip to content

Commit

Permalink
feat: initial commit
Browse files Browse the repository at this point in the history
  • Loading branch information
cbrinson-rise8 committed Nov 19, 2024
1 parent 2e2af2d commit c36a188
Show file tree
Hide file tree
Showing 14 changed files with 1,554 additions and 4 deletions.
22 changes: 22 additions & 0 deletions compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -76,3 +76,25 @@ services:
depends_on:
api:
condition: service_healthy

algo-test-runner:
build:
context: tests/algorithm
dockerfile: Dockerfile
env_file:
- tests/algorithm/algo_test.env
environment:
DB_URI: "postgresql+psycopg2://postgres:pw@db:5432/postgres"
API_URL: "http://api:8080"
volumes:
- ./tests/algorithm/scripts:/app/scripts
- ./tests/algorithm/data:/app/data
- ./tests/algorithm/results:/app/results
- ./tests/algorithm/configurations:/app/configurations
depends_on:
db:
condition: service_healthy
api:
condition: service_healthy
profiles:
- algo-test
3 changes: 2 additions & 1 deletion src/recordlinker/linking/mpi_service.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@

import typing
import uuid
import json

from sqlalchemy import insert
from sqlalchemy import orm
Expand Down Expand Up @@ -133,7 +134,7 @@ def bulk_insert_patients(
pat_data = [
{
"person_id": person.id,
"_data": record.to_json(prune_empty=True),
"_data": json.loads(record.to_json(prune_empty=True)),
"external_patient_id": record.external_id,
"external_person_id": external_person_id,
"external_person_source": "IRIS" if external_person_id else None,
Expand Down
12 changes: 12 additions & 0 deletions tests/algorithm/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# Use the official Python 3.11 slim image as the base
FROM python:3.12-slim

# Set the working directory
WORKDIR /app

# Copy the scripts and data directories into the image
COPY scripts /app/scripts
COPY data /app/data

# Install Python dependencies
RUN pip install --no-cache-dir pandas requests
77 changes: 77 additions & 0 deletions tests/algorithm/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
# Record Linkage Algorithm Testing

This repository contains a project to test the effectiveness of the RecordLinker algorithm.

## Prerequisites

Before getting started, ensure you have the following installed:

- [Docker](https://docs.docker.com/engine/install/)
- [Docker Compose](https://docs.docker.com/compose/install/)

## Setup

Before getting started, ensure you have the following installed:

- [Docker](https://docs.docker.com/engine/install/)
- [Docker Compose](https://docs.docker.com/compose/install/)

## Steup

1. Build the Docker images:

```bash
docker compose --profile algo-test build
```

2. Configure environment variables

```bash
edit tests/algorithm/alo_test.env
```
Edit the environment variables in the file

3. Edit the algorithm configuration file

```bash
edit tests/algorithm/configurations/algorithm_configuration.json
```
Edit the configuration file to tune the algorithm parameters

## Running Algorithm Tests

1. Run the tests

```bash
docker compose --profile algo-test run --rm algo-test-runner python scripts/run_test.py
```

2. Analyze the results

The results of the algorithm tests will be available in the `results/` directory.

## Environment Variables

1. `env_file`: The attributes that should be tuned for your particular algorithm test,
are located in the `algo_test.env` file.

2. `environment`: The attributes that should likely remain static for all algorithm tests are located directly in the `compose.yml` file.

### Algorithm Test Parameters

The following environment variables can be tuned in the `algo-test.env` file:

- `SEED_FILE`: The file containing person data to seed the mpi with
- `TEST_FILE`: The file containing patient data to test the algorithm with
- `RESULTS_FILE`: The file to write the results of the algorithm test to
- `ALGORITHM_CONFIGURATION`: The file containing the algorithm configuration json
- `ALGORITHM_NAME`: The name of the algorithm to test


## Cleanup

After you've finished running performance tests and analyzing the results, you can stop and remove the Docker containers by running:
```bash
docker compose --profile algo-test down
```
5 changes: 5 additions & 0 deletions tests/algorithm/algo_test.env
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
SEED_FILE="data/1000patients.csv"
TEST_FILE="data/1000patientsTestData.csv"
RESULTS_FILE="output.csv"
ALGORITHM_CONFIGURATION="configurations/algorithm_configuration.json"
ALGORITHM_NAME="test-config"
51 changes: 51 additions & 0 deletions tests/algorithm/configurations/algorithm_configuration.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
{
"label": "test-config",
"description": "test algorithm configuration",
"is_default": false,
"passes": [
{
"blocking_keys": [
"BIRTHDATE"
],
"evaluators": {
"FIRST_NAME": "func:recordlinker.linking.matchers.feature_match_fuzzy_string",
"LAST_NAME": "func:recordlinker.linking.matchers.feature_match_exact"
},
"rule": "func:recordlinker.linking.matchers.eval_perfect_match",
"cluster_ratio": 0.9,
"kwargs": {
"thresholds": {
"FIRST_NAME": 0.9,
"LAST_NAME": 0.9,
"BIRTHDATE": 0.95,
"ADDRESS": 0.9,
"CITY": 0.92,
"ZIP": 0.95
}
}
},
{
"blocking_keys": [
"ZIP",
"FIRST_NAME",
"LAST_NAME"
],
"evaluators": {
"ADDRESS": "func:recordlinker.linking.matchers.feature_match_fuzzy_string",
"BIRTHDATE": "func:recordlinker.linking.matchers.feature_match_exact"
},
"rule": "func:recordlinker.linking.matchers.eval_perfect_match",
"cluster_ratio": 0.9,
"kwargs": {
"thresholds": {
"FIRST_NAME": 0.9,
"LAST_NAME": 0.9,
"BIRTHDATE": 0.95,
"ADDRESS": 0.9,
"CITY": 0.92,
"ZIP": 0.95
}
}
}
]
}
Loading

0 comments on commit c36a188

Please sign in to comment.