A complete solution for serving MIMIC-IV healthcare data through a HAPI FHIR server with PostgreSQL backend, patient filtering capabilities, and bulk import automation.
This repository provides a complete toolkit for deploying the MIMIC-IV dataset as a FHIR R4 server using HAPI FHIR, PostgreSQL, and Docker Compose. Perfect for healthcare data scientists, ML engineers, and researchers who need rapid access to structured clinical data through standardized FHIR APIs.
- 🚀 One-command deployment with Docker Compose
- 🔍 Patient subset filtering for targeted analysis and resource optimization
- 📁 Bulk import automation with progress monitoring and error handling
- 🌐 Optional Cloudflare tunnel for external access (demo purposes only)
- 📝 Jupyter notebook for patient cohort exploration and filtering
You must complete the PhysioNet credentialing process and agree to the data use agreement:
- Full Dataset: MIMIC-IV v3.1
- Demo Dataset (for testing): MIMIC-IV FHIR Demo v2.0
- Docker and Docker Compose (Docker Desktop on macOS)
- 8GB+ RAM (4GB minimum for demo dataset)
- 50GB+ disk space (for full dataset, 5GB for demo)
- Python 3.8+ (optional, for filtering scripts)
- Download NDJSON files from PhysioNet
- Place all
.ndjson.gzfiles in the./input_data/fhirdirectory - Keep original filenames and compression (faster loading)
git clone https://github.com/your-username/mimic-iv-on-fhir-21.git
cd mimic-iv-on-fhir-21
# Create input_data/fhir directory and place your NDJSON files here
mkdir -p input_data/fhir
# Copy your downloaded MIMIC-IV FHIR files to ./input_data/fhir/# Start all services
docker compose up -d
# Monitor startup
docker compose logs -f fhir# Check FHIR server capability
curl -f http://localhost:8080/fhir/metadata
# Check file server
curl -I http://localhost:8000/
# View service status
docker compose ps# Import all FHIR resources
python bulk_import.py \
--fhir-url http://localhost:8080/fhir \
--file-server-url http://fhir-files:8000
# Or import specific resources only
python bulk_import.py \
--fhir-url http://localhost:8080/fhir \
--file-server-url http://fhir-files:8000 \
--files MimicPatient.ndjson.gz MimicCondition.ndjson.gz# Get patient count
curl "http://localhost:8080/fhir/Patient?_summary=count"
# Search for specific conditions
curl "http://localhost:8080/fhir/Condition?code=I50.9&_count=10"
# Get observations for a patient
curl "http://localhost:8080/fhir/Observation?subject=Patient/12345&_count=5"For large-scale analysis or resource-constrained environments, you can filter data to include only specific patient cohorts:
Use the included Jupyter notebook to explore and identify patients:
jupyter notebook explore.ipynbExample: Find patients with specific conditions
# Get all patients with diabetes (ICD-10: E11)
patients = get_patients_with_condition("E11")
# Save patient IDs for filtering
with open("patient_ids_to_filter.txt", "w") as f:
for patient_id in patients:
f.write(f"{patient_id}\n")python filter_fhir_by_patients.py \
--patient-list patient_ids_to_filter.txt \
--fhir-dir ./input_data/fhir \
--output-dir ./filtered_fhir_cohort# Update the script to point to your filtered directory
# Then run the bulk import
python bulk_import.py \
--fhir-url http://localhost:8080/fhir \
--file-server-url http://fhir-files:8000- Swagger UI:
http://localhost:8080/fhir/(built-in HAPI interface) - Vanya Labs: vanyalabs.com (point to
http://localhost:8080/fhir)
python bulk_import.py [OPTIONS]| Parameter | Description | Default |
|---|---|---|
--fhir-url |
HAPI FHIR server base URL | http://localhost:8080/fhir |
--file-server-url |
Internal file server URL | http://fhir-files:8000 |
--files |
Specific files to import | Auto-discover all |
--dry-run |
Preview import plan without execution | False |
--timeout |
Max wait time for completion (seconds) | 3600 |
Built with ❤️ for the healthcare AI community