Performance issue #2063

liquid36 · 2025-01-30T17:32:31Z

Hi!
I'm working with a synthetic dataset in order to test the tool. I have 1000 patients and 50.000 conditions.

I made this basic request that never ends.

POST http://localhost:9090/fhir/Patient/$aggregate

{
    "resourceType": "Parameters",
    "parameter": [
        {
            "name": "aggregation",
            "valueString": "count()"
        },
        {
            "name": "grouping",
            "valueString": "reverseResolve(Condition.subject).code.coding.where(subsumedBy(http://snomed.info/sct|73211009))"
        }
    ]
}

The problem is Pathling made one request per Condition to check if it belong to <<73211009 adn thats is unviable. How do you deal with this?

The text was updated successfully, but these errors were encountered:

johngrimes · 2025-01-31T04:24:58Z

Hi @liquid36,

Thanks for trying it out!

Which terminology server are you using? It uses https://tx.ontoserver.csiro.au/fhir by default, are you using something different?

There is a configuration option that might help diagnose the problem: pathling.terminology.verboseLogging (https://pathling.csiro.au/docs/server/configuration#terminology-service). Some logging with this option turned on might be helpful.

We have tried many different strategies for making terminology requests, and we found the individual request model to actually work fastest. This is because we can effectively parallelize the requests, cache the results and only make unique requests that we have not made before. Pathling has a client-side cache to facilitate this, and most terminology servers will also have a server-side cache in addition to this.

We have demonstrated that this works effectively on large datasets with tens of thousands of unique SNOMED CT codings. Ontoserver is the terminology server that we prefer to use, and it can service a subsumes request in less than 5 ms.

liquid36 · 2025-01-31T14:48:06Z

I'm using Snowstorm. But i just tried with the default ontoserver and it worked better.

How could i parallelize the requests? Deploying an Spark Cluster?

For the amount of data i mention before, i'm getting a response of 3/4 seconds with cache enabled, is it okey?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance issue #2063

Performance issue #2063

liquid36 commented Jan 30, 2025

johngrimes commented Jan 31, 2025

liquid36 commented Jan 31, 2025

Performance issue #2063

Performance issue #2063

Comments

liquid36 commented Jan 30, 2025

johngrimes commented Jan 31, 2025

liquid36 commented Jan 31, 2025