generated from CDCgov/template
-
Notifications
You must be signed in to change notification settings - Fork 2
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
2e2af2d
commit c36a188
Showing
14 changed files
with
1,554 additions
and
4 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
# Use the official Python 3.11 slim image as the base | ||
FROM python:3.12-slim | ||
|
||
# Set the working directory | ||
WORKDIR /app | ||
|
||
# Copy the scripts and data directories into the image | ||
COPY scripts /app/scripts | ||
COPY data /app/data | ||
|
||
# Install Python dependencies | ||
RUN pip install --no-cache-dir pandas requests |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,77 @@ | ||
# Record Linkage Algorithm Testing | ||
|
||
This repository contains a project to test the effectiveness of the RecordLinker algorithm. | ||
|
||
## Prerequisites | ||
|
||
Before getting started, ensure you have the following installed: | ||
|
||
- [Docker](https://docs.docker.com/engine/install/) | ||
- [Docker Compose](https://docs.docker.com/compose/install/) | ||
|
||
## Setup | ||
|
||
Before getting started, ensure you have the following installed: | ||
|
||
- [Docker](https://docs.docker.com/engine/install/) | ||
- [Docker Compose](https://docs.docker.com/compose/install/) | ||
|
||
## Steup | ||
|
||
1. Build the Docker images: | ||
|
||
```bash | ||
docker compose --profile algo-test build | ||
``` | ||
|
||
2. Configure environment variables | ||
|
||
```bash | ||
edit tests/algorithm/alo_test.env | ||
``` | ||
Edit the environment variables in the file | ||
|
||
3. Edit the algorithm configuration file | ||
|
||
```bash | ||
edit tests/algorithm/configurations/algorithm_configuration.json | ||
``` | ||
Edit the configuration file to tune the algorithm parameters | ||
|
||
## Running Algorithm Tests | ||
|
||
1. Run the tests | ||
|
||
```bash | ||
docker compose --profile algo-test run --rm algo-test-runner python scripts/run_test.py | ||
``` | ||
|
||
2. Analyze the results | ||
|
||
The results of the algorithm tests will be available in the `results/` directory. | ||
|
||
## Environment Variables | ||
|
||
1. `env_file`: The attributes that should be tuned for your particular algorithm test, | ||
are located in the `algo_test.env` file. | ||
|
||
2. `environment`: The attributes that should likely remain static for all algorithm tests are located directly in the `compose.yml` file. | ||
|
||
### Algorithm Test Parameters | ||
|
||
The following environment variables can be tuned in the `algo-test.env` file: | ||
|
||
- `SEED_FILE`: The file containing person data to seed the mpi with | ||
- `TEST_FILE`: The file containing patient data to test the algorithm with | ||
- `RESULTS_FILE`: The file to write the results of the algorithm test to | ||
- `ALGORITHM_CONFIGURATION`: The file containing the algorithm configuration json | ||
- `ALGORITHM_NAME`: The name of the algorithm to test | ||
|
||
|
||
## Cleanup | ||
|
||
After you've finished running performance tests and analyzing the results, you can stop and remove the Docker containers by running: | ||
```bash | ||
docker compose --profile algo-test down | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
SEED_FILE="data/1000patients.csv" | ||
TEST_FILE="data/1000patientsTestData.csv" | ||
RESULTS_FILE="output.csv" | ||
ALGORITHM_CONFIGURATION="configurations/algorithm_configuration.json" | ||
ALGORITHM_NAME="test-config" |
51 changes: 51 additions & 0 deletions
51
tests/algorithm/configurations/algorithm_configuration.json
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,51 @@ | ||
{ | ||
"label": "test-config", | ||
"description": "test algorithm configuration", | ||
"is_default": false, | ||
"passes": [ | ||
{ | ||
"blocking_keys": [ | ||
"BIRTHDATE" | ||
], | ||
"evaluators": { | ||
"FIRST_NAME": "func:recordlinker.linking.matchers.feature_match_fuzzy_string", | ||
"LAST_NAME": "func:recordlinker.linking.matchers.feature_match_exact" | ||
}, | ||
"rule": "func:recordlinker.linking.matchers.eval_perfect_match", | ||
"cluster_ratio": 0.9, | ||
"kwargs": { | ||
"thresholds": { | ||
"FIRST_NAME": 0.9, | ||
"LAST_NAME": 0.9, | ||
"BIRTHDATE": 0.95, | ||
"ADDRESS": 0.9, | ||
"CITY": 0.92, | ||
"ZIP": 0.95 | ||
} | ||
} | ||
}, | ||
{ | ||
"blocking_keys": [ | ||
"ZIP", | ||
"FIRST_NAME", | ||
"LAST_NAME" | ||
], | ||
"evaluators": { | ||
"ADDRESS": "func:recordlinker.linking.matchers.feature_match_fuzzy_string", | ||
"BIRTHDATE": "func:recordlinker.linking.matchers.feature_match_exact" | ||
}, | ||
"rule": "func:recordlinker.linking.matchers.eval_perfect_match", | ||
"cluster_ratio": 0.9, | ||
"kwargs": { | ||
"thresholds": { | ||
"FIRST_NAME": 0.9, | ||
"LAST_NAME": 0.9, | ||
"BIRTHDATE": 0.95, | ||
"ADDRESS": 0.9, | ||
"CITY": 0.92, | ||
"ZIP": 0.95 | ||
} | ||
} | ||
} | ||
] | ||
} |
Oops, something went wrong.