Skip to content

timsbiomed/CCDH

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CCDH

Code specific to the CCDH Terminology Work

Extracting value sets from caDSR

Working with the CRDC Data Model Dictionaries.

Given:

  1. A caDSR Data Element Public Id (e.g. 5432594)
  2. The current RDF version of the caDSR (at the moment, the RDF version is not public -- I contact Gilberto Fragosio to get an image)
  3. The current RDF version of the NCI Thesaurus (OWL version is public, modified version used here is not)
  4. A reaonably current RDF version of the UMLS -- this can be found on the BioPortal SPARQL Endpoint (location TBD)

In circumstances where the permissible values are available, the following query:

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX cmdr: <http://cbiit.nci.nih.gov/caDSR#>
PREFIX isomdr: <http://www.iso.org/11179/MDR#>
PREFIX ncit: <http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#>
select DISTINCT ?value ?ncit_concept ?concept_name ?role ?order where {
    ?s cmdr:publicId "3111302" .
    ?s isomdr:permitted_value ?pv .
    ?pv rdfs:label ?value .
    ?pv cmdr:has_concept ?cd .
    ?cd ?role ?ncit_concept .
    ?cd cmdr:display_order ?order .
    ?ncit_concept rdfs:label ?concept_name .
}

produces

value ncit_concept concept_name role order
Bone Marrow UNK:C12431 Bone Marrow cmdr:main_concept 0
Saliva UNK:C13275 Saliva cmdr:main_concept 0
...
Mononuclear Cells from Bone Marrow UNK:C12431 Bone Marrow cmdr:main_concept 0
Mononuclear Cells from Bone Marrow UNK:C42885 Derivation cmdr:minor_concept 1
Mononuclear Cells from Bone Marrow UNK:C73123 Mononucleated Blood Cell cmdr:minor_concept 2

Notes:

  1. The 'UNK' URI prefix is actually http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl# -- the "." in the path causes a lot of the RDF tools heartburn.
  2. The "Mononuclear Cells from Bon Marrow" caDSR entry uses "coordination by juxtaposition" -- this is going to present an interesting challenge when it comes to mapping.

The above output can be used to produce text value set (We need to discuss how tables of permissible values can be mapped to codes)

Assuming that the coordination issues can be addressed, the NCIt Concepts provide a jumping-off point to:

  1. The NCI Thesaurus. The query:
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
select distinct ?l ?o WHERE {
    <http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#C12431> ?p ?o.
	?p rdfs:label ?l .
}

Gives us everything that we know about the particular NCIt concept including the value sets (NCI Subsets) it is a member of and its mappings to other code systems -- a key one of which is the UMLS CUI. (e.g. C0005953)

The UMLS CUI, in turn, gives us an entry point into the UMLS, an RDF representation of which is available in BioPortal -- this gives us a bridge into everything that "maps to" the UMLS concept.

We can use the combination of the inputs and UMLS based mappings to build both FHIR and TCCM Mapping resources

Possible approaches and paths

ShEx and caDSR

The Shape Expressions language could be used both to express the more complex queries such as shown above as well as to define what is required to enter a new data element into the caDSR. This may have limited value, however unless there is a R/W caDSR image.

We anticipate using a combination of ShExMap and JSON-LD contexts to transform value sets and maps to both FHIR and TCCM value sets and mappings.

About

Code specific to the CCDH Terminology Work

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages