Code specific to the CCDH Terminology Work
Working with the CRDC Data Model Dictionaries.
Given:
- A caDSR Data Element Public Id (e.g. 5432594)
- The current RDF version of the caDSR (at the moment, the RDF version is not public -- I contact Gilberto Fragosio to get an image)
- The current RDF version of the NCI Thesaurus (OWL version is public, modified version used here is not)
- A reaonably current RDF version of the UMLS -- this can be found on the BioPortal SPARQL Endpoint (location TBD)
In circumstances where the permissible values are available, the following query:
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX cmdr: <http://cbiit.nci.nih.gov/caDSR#>
PREFIX isomdr: <http://www.iso.org/11179/MDR#>
PREFIX ncit: <http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#>
select DISTINCT ?value ?ncit_concept ?concept_name ?role ?order where {
?s cmdr:publicId "3111302" .
?s isomdr:permitted_value ?pv .
?pv rdfs:label ?value .
?pv cmdr:has_concept ?cd .
?cd ?role ?ncit_concept .
?cd cmdr:display_order ?order .
?ncit_concept rdfs:label ?concept_name .
}
produces
value | ncit_concept | concept_name | role | order |
---|---|---|---|---|
Bone Marrow | UNK:C12431 | Bone Marrow | cmdr:main_concept | 0 |
Saliva | UNK:C13275 | Saliva | cmdr:main_concept | 0 |
... | ||||
Mononuclear Cells from Bone Marrow | UNK:C12431 | Bone Marrow | cmdr:main_concept | 0 |
Mononuclear Cells from Bone Marrow | UNK:C42885 | Derivation | cmdr:minor_concept | 1 |
Mononuclear Cells from Bone Marrow | UNK:C73123 | Mononucleated Blood Cell | cmdr:minor_concept | 2 |
Notes:
- The 'UNK' URI prefix is actually http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl# -- the "." in the path causes a lot of the RDF tools heartburn.
- The "Mononuclear Cells from Bon Marrow" caDSR entry uses "coordination by juxtaposition" -- this is going to present an interesting challenge when it comes to mapping.
The above output can be used to produce text value set (We need to discuss how tables of permissible values can be mapped to codes)
Assuming that the coordination issues can be addressed, the NCIt Concepts provide a jumping-off point to:
- The NCI Thesaurus. The query:
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
select distinct ?l ?o WHERE {
<http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#C12431> ?p ?o.
?p rdfs:label ?l .
}
Gives us everything that we know about the particular NCIt concept including the value sets (NCI Subsets) it is a member of and its mappings to other code systems -- a key one of which is the UMLS CUI. (e.g. C0005953)
The UMLS CUI, in turn, gives us an entry point into the UMLS, an RDF representation of which is available in BioPortal -- this gives us a bridge into everything that "maps to" the UMLS concept.
We can use the combination of the inputs and UMLS based mappings to build both FHIR and TCCM Mapping resources
The Shape Expressions language could be used both to express the more complex queries such as shown above as well as to define what is required to enter a new data element into the caDSR. This may have limited value, however unless there is a R/W caDSR image.
We anticipate using a combination of ShExMap and JSON-LD contexts to transform value sets and maps to both FHIR and TCCM value sets and mappings.