Skip to content

Commit

Permalink
Add load testing results to performance and cost documentation.
Browse files Browse the repository at this point in the history
PiperOrigin-RevId: 552546273
  • Loading branch information
Kai-Bailey authored and copybara-github committed Jul 31, 2023
1 parent 816853d commit 0518335
Showing 1 changed file with 27 additions and 6 deletions.
33 changes: 27 additions & 6 deletions docs/performance_and_cost.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,17 +8,38 @@ If the Bulk FHIR Server is not meeting your performance needs you can try limiti

2. **Bulk Data Output File Request** \
The Bulk FHIR Server returns a list of URLs with FHIR ndjson. `bulk_fhir_fetch` tool downloads the data from each URL one at a time. If needed in the future we can improve this to download from URLs concurrently. The `bulk_fhir_fetch` tool does minimal processing and all outputs (to FHIR Store, GCS, etc) are implemented as non-blocking and concurrent. There are several metrics to figure out if `bulk_fhir_fetch` is the bottleneck at this stage. See [logs and monitoring documentation](/docs/logs_and_monitoring.md) for more details. The total time for all the ndjson URLs is logged "It took %s to download, process and output the FHIR from all the ndjson URLs.". <br> <br>

If the `bulk_fhir_fetch` ingestion tool is not meeting your performance needs please file a bug or complete our [survey](https://docs.google.com/forms/d/e/1FAIpQLSdmWHaGc41gWiobMT6kNd0PGPPeWGeS-LyG6CrGZ79moaUIEQ/viewform)! There are a couple low hanging performance improvements we can make if there is a need.

## FHIR Store Upload Options

`bulk-fhir-fetch` supports three different ways to upload data to FHIR Store; each with different performance and costs. The cost will depend mainly on the number of requests and amount of data stored. Full details at [Cloud Healthcare API pricing](https://cloud.google.com/healthcare-api/pricing).

1. **Individual Upload** \
Each FHIR Resource is uploaded in an individual API call to FHIR Store using the [fhir.update](https://cloud.google.com/healthcare-api/docs/reference/rest/v1/projects.locations.datasets.fhirStores.fhir/update) method. This will likely be the most expensive option for uploading to FHIR Store. You may also receive error code 429, "Quota exceeded for Number of FHIR operations per minute per region".
1. **GCS Based Upload [Recommended]** \
Writes NDJSONs from the Bulk FHIR Server to [GCS](https://cloud.google.com/storage/docs), and then triggers a batch FHIR store import job from the GCS location using the [fhirStores.Import](https://cloud.google.com/healthcare-api/docs/reference/rest/v1/projects.locations.datasets.fhirStores/import) method. The `bulk_fhir_fetch` job does NOT delete the downloaded FHIR from GCS bucket after each run. In the GCS based upload you will need to pay for the GCS storage and requests. However, since GCS is so cheap and [fhirStores.Import](https://cloud.google.com/healthcare-api/docs/reference/rest/v1/projects.locations.datasets.fhirStores/import) is inexpensive this upload method will likely be the cheapest of the three. To enable GCS based upload use the `-fhir_store_enable_gcs_based_upload` and `-fhir_store_gcs_based_upload_bucket` flags. GCS Based Upload is recommended for production.

2. **Individual Upload** \
Each FHIR Resource is uploaded in an individual API call to FHIR Store using the [fhir.update](https://cloud.google.com/healthcare-api/docs/reference/rest/v1/projects.locations.datasets.fhirStores.fhir/update) method. This will likely be the most expensive option for uploading to FHIR Store. You may also receive error code 429, "Quota exceeded for Number of FHIR operations per minute per region". Individual upload is only recommended for small tests.

3. **Batch Upload** \
Uploads batches of FHIR Resources to FHIR Store using the [fhir.executeBundle](https://cloud.google.com/healthcare-api/docs/reference/rest/v1/projects.locations.datasets.fhirStores.fhir/executeBundle) method. The default bundle size is 5 fhir resources, but can be overridden using the `-fhir_store_batch_upload_size` flag. To enable batch upload use the `-fhir_store_enable_batch_upload` flag. It can be tricky to find a batch size that is performant, but doesn't exceed the 50mb [fhir.executeBundle size limit](https://cloud.google.com/healthcare-api/quotas#resource_limits). For that reason GCS Based Upload is recommended for production.

## Load Tests

We ran load tests of `bulk_fhir_fetch` against the [`test_server`](/cmd/test_server/README.md) with
Synthea data and against the BCDA sandbox. The Synthea data was 23 GB and
included 20 FHIR ResourceTypes. The BCDA Sandbox data (extra large credentials)
was 20.3 GB with Patient, Coverage and ExplanationofBenefit Resource Types. The
GCS based upload method was used and rectify was set to true.

**Test Server with Synthea** \
The Bulk FHIR server took 10s to return URLs after the initial Bulk Data Kick-off Request. \
It took 1h47m to download, process and output the FHIR from all the ndjson URLs. \
Of the 1h47m it took 1h23m to download the data to GCS and 24m for the [fhirStores.Import](https://cloud.google.com/healthcare-api/docs/reference/rest/v1/projects.locations.datasets.fhirStores/.import).

2. **Batch Upload** \
Uploads batches of FHIR Resources to FHIR Store using the [fhir.executeBundle](https://cloud.google.com/healthcare-api/docs/reference/rest/v1/projects.locations.datasets.fhirStores.fhir/executeBundle) method. The default bundle size is 5 fhir resources, but can be overridden using the `-fhir_store_batch_upload_size` flag. To enable batch upload use the `-fhir_store_enable_batch_upload` flag. Batch upload will likely be cheaper and faster than the individual upload option.
**BCDA Sandbox (Extra Large Dataset)** \
The Bulk FHIR server took 17m to return URLs after the initial Bulk Data Kick-off Request. \
It took 5h10m to download, process and output the FHIR from all the ndjson URLs. \
Of the 5h10m it took 4h54m to download the data to GCS and 16m for the [fhirStores.Import](https://cloud.google.com/healthcare-api/docs/reference/rest/v1/projects.locations.datasets.fhirStores/.import).

3. **GCS Based Upload** \
Writes NDJSONs from the Bulk FHIR Server to [GCS](https://cloud.google.com/storage/docs), and then triggers a batch FHIR store import job from the GCS location using the [fhirStores.Import](https://cloud.google.com/healthcare-api/docs/reference/rest/v1/projects.locations.datasets.fhirStores/import) method. The `bulk_fhir_fetch` job does NOT delete the downloaded FHIR from GCS bucket after each run. In the GCS based upload you will need to pay for the GCS storage and requests. However, since GCS is so cheap and [fhirStores.Import](https://cloud.google.com/healthcare-api/docs/reference/rest/v1/projects.locations.datasets.fhirStores/import) method is inexpensive this upload method will likely be the cheapest of the three. Performance wise GCS based upload will likely be slower than batch upload. To enable GCS based upload use the `-fhir_store_enable_gcs_based_upload` and `-fhir_store_gcs_based_upload_bucket` flags.
If the `bulk_fhir_fetch` ingestion tool is not meeting your performance needs please file a bug or complete our [survey](https://docs.google.com/forms/d/e/1FAIpQLSdmWHaGc41gWiobMT6kNd0PGPPeWGeS-LyG6CrGZ79moaUIEQ/viewform).

0 comments on commit 0518335

Please sign in to comment.