Replies: 5 comments 5 replies
-
So we're not quite at that point, mostly on the live FHIR querying side of things. What could be done today:
Part of our philosophy is to try and get as much data out via the ETL and worry about defining study cohorts after the fact, so that we don't have to constantly go back to the FHIR servers, and have more control over what kind of requests are being made of them. But if you get a large set of data out via ETL, then you should be able to iterate on hypothetical clinical trials. Our ML/LLM infrastructure at the moment is more aimed at feature detection in flight during extraction, so we don't have an API/tuning prompts for this kind of interaction yet, though it is something that we've talked about in the future - no timeline on this yet. If you try something like this, I'm sure we'd be very interested in hearing how it goes/if you run into roadblocks. |
Beta Was this translation helpful? Give feedback.
-
There certainly should be! We'll add that to our docs. In the interim, this issue containing Epic permissions to request lists all the resources we're supporting. We're currently setup to run in AWS and manage queries in Athena, and we're using DuckDB locally for testing; it could be used for real use cases depending on the data size. Azure is in our roadmap but it's not scheduled currently; that's probably at least a year out.
In our project nomenclature,
Our first pass transformation takes the root elements of a given resource and treats those as columns, and if there's more underneath, it creates array/row objects that may take some unnesting. For example, on the Condition resource, the Our second pass transformation, in the
I think this is a question of choosing the problem you want to solve. Cumulus tries to create data warehouses for analyst users who may not know/care about FHIR. It does this by doing a lot of heavy churning up front, specifically leveraging the bulk FHIR protocol, and it produces outputs designed for ingestion by specific cloud tooling. Once you've got it set up you should have a repeatable process for users, and it's easy to share study definitions with other institutions running the framework, but there is an upfront engineering cost to pay to get to this point. Running FHIR queries against documents is less perscriptive. If you've got FHIR tooling in place, or a path from an LLM agent to a FHIR query, you should be able to run against different vintages of FHIR data from different sources. But it will be a bit more ad hoc, harder to share, and require smaller/more frequent trips to the source (or creating a shadowing server to mirror your production infrastructure). My intuition, I think similar to yours, is that this will be a little harder to get an LLM agent to run queries against. |
Beta Was this translation helpful? Give feedback.
-
Embarrassed that we didn't have an obvious list of supported resources, I made one. |
Beta Was this translation helpful? Give feedback.
-
Yep, that's correct.
Going back to that example above of Condition.category, if there were two coding systems in that field, we create two corresponding rows in that table for the same resource ID- basically, we're trying to limit the need for an analyst to do cross-joins where possible. So we're assuming that a user will be using one coding system for search, or providing a list to walk down, or otherwise doing some kind of filtering operation - or, if they are just trying to get a count of all conditions occuring after a certain date, they should make sure that the condition.id field is part of a distinct clause. This kind of coding/codeableconcept behavior is pretty common across the resource tables.
My intuition is that the helper class for an LLM agent is probably a non-starter today - we'd need a lot more very low level documentation for that to be accurate. Prompt tuning with the table schemas and some guidance about linking the resources together might be easier to MVP, though I suspect this is going to be the trickiest part to wrangle. You might? be able to build up a corpus with that MVP for a few dozen cases, and then that might get you to something that could more easily integrate with something like our helper class - but I think that's also a question of 'is this for exploration or for productizing'. |
Beta Was this translation helpful? Give feedback.
-
Mostly we haven't gotten to it. We are most interested in resources in US Core v4, and I don't think Questionnaire is (though looks like it's in v6). So without a US Core profile and without needing Questionnaire ourselves, we just haven't prioritized it.
Sorry, I'm not 100% sure I understand - can you expand on this? |
Beta Was this translation helpful? Give feedback.
-
would love to set up a conversational agent able to carry out analytics on clinical trials data (e.g., ODM). The clinical trials might be hypothetical, meaning they have not been carried out yet and only the study definition and protocol is provided. In that case, the study should be linked to one or more FHIR servers, from which real-world data (RWD) relevant to the study should be fetched and extracted.
is the library mature enough for such a use case? I was thinking if converting the RWD to SQL might help extract data for clinical trials, but not being familiar with the library, I am not too sure if that's even feasible in the current state.
Beta Was this translation helpful? Give feedback.
All reactions