Loading MeSH datasets

Before any data loading

Jena assembler config file <MTW_HOME_DIR>/instance/conf/mesh.ttl MUST BE set up properly !

Jena 4

https://github.com/filak/MTW-MeSH/blob/master/flask-app/instance/conf/mesh_Jena4.ttl

Jena 5

https://github.com/filak/MTW-MeSH/blob/master/flask-app/instance/conf/mesh_Jena5.ttl

Copy the file to <MTW_HOME_DIR>/instance/conf/ and rename it as mesh.ttl

Adjust the paths in mesh.ttl to your <FUSEKI_DATA_DIR>

Use forward slashes

    tdb2:location  "c:/<FUSEKI_DATA_DIR>/databases/mesh" ;

    text:directory "c:/<FUSEKI_DATA_DIR>/indexes/mesh" ;

Validate mesh.ttl

No output = file is OK

  riot --validate mesh.ttl

Copy the mesh.ttl file to:

<FUSEKI_DATA_DIR>/configuration/

Get the official MeSH RDF dataset

Download the official MeSH RDF dataset mesh.nt.gz from https://nlmpubs.nlm.nih.gov/projects/mesh/rdf/

You might use curl tool for downloading

curl https://nlmpubs.nlm.nih.gov/projects/mesh/rdf/mesh.nt.gz --ssl-no-revoke -O

! IMPORTANT NOTICE !

As of this writing - Jan 2025 - the above is no longer true.

The mesh.nt.gz currently available is still the MeSH 2024 version - hash c9ef004de88b9201b84f90aad2966bfd067af799

And despite several efforts (https://github.com/HHS/meshrdf/issues/212#issuecomment-2539919254) to get some information when the full RDF dataset for MeSH 2025 version will be made available (if at all) - NLM stays silent. Also the release notes are outdated.

The only official MeSH 2025 RDF datasets available are here https://nlmpubs.nlm.nih.gov/projects/mesh/rdf/2025/ - BUT:

these are not the complete datasets - obsolete/inactive items are missing - no meshv:active triples are present
this is the "name-spaced" version - prefix http://id.nlm.nih.gov/mesh/2025/

The information about MeSH item status is vital - both for the translation process and for functional MTW outputs/exports. There are existing data workflows for updating obsolete MeSH items etc which rely on active/inactive status being available.

So what can be done in this situation ? Let's try create the most complete MeSH 2025 RDF version.

You can follow this guide or skip it and just download the final files - mesh.nt.gz and mesh2024_inactive.nt

Step 1: Get the MeSH 2025 RDF without the year name-spaced prefix - mesh.nt.gz

Download all the official MeSH 2025 XML files here and produce the RDF dataset mesh.nt.gz with https://github.com/HHS/meshrdf script - no year in the namespace (!)

OR

Download the https://nlmpubs.nlm.nih.gov/projects/mesh/rdf/2025/mesh2025.nt.gz and update the namespace using MTW script tools/update-ns.py

  py update-ns.py mesh2025.nt.gz http://id.nlm.nih.gov/mesh/2025/ http://id.nlm.nih.gov/mesh/ mesh.nt.gz

Step 2: Create the inactive items dataset - mesh2024_inactive.nt

Fortunately there were no deleted main headings according to the UMLS MeSH 2025 reports - so we can use the last year complete dataset.

Download the complete MeSH 2024 dataset mesh.nt.gz - save it as mesh2024_full.nt.gz and extract the inactive items using Jena tool arq with this query:

  arq --data=mesh2024_full.nt.gz --query=mesh-inactive.sparql > mesh2024_inactive.ttl

  riot --output=N-TRIPLES mesh2024_inactive.ttl > mesh2024_inactive.nt

Step 3: Copy the two created files to your <IMPORT> directory

mesh.nt.gz
mesh2024_inactive.nt

Get the translation RDF dataset

If you have not translated MeSH before - you can proceed to Import.

Convert the official UMLS TSV file

Use the trans_only_YYYY_extended.txt and convert it with the mesh-trx2nt tool.

The file MUST have the following columns/items:

DescriptorUI | ConceptUI | Language | TermType | String | TermUI | ScopeNote | Tree | Created | Relation | ParentCUI

the header row is optional
the TermUI column is always empty
the Relation and ParentCUI need to be present at rows with Custom Concepts (ConceptUI starts with F...) and TermType PEP only

Display help - open CMD and run:

 mesh-trx2nt -h

usage: mesh-trx2nt inputFile langcode meshxPrefix [options]

Extracting translation dataset from NLM UMLS text file [trans_only_2023_expanded.txt]

positional arguments:
  inputFile    NLM UMLS text file name (plain or gzipped)
  langcode     Language code
  meshxPrefix  MeSH Translation namespace prefix ie. http://my.mesh.com/id/

options:
  -h, --help   show this help message and exit
  --out OUT    Output file name prefix

IMPORTANT

The langcode parameter MUST be the same as the TARGET_LANG value in your mtw.ini config file !

The meshxPrefix parameter MUST be the same as the TARGET_NS value in your mtw.ini config file !

Run the conversion - open CMD and run ie.:

 mesh-trx2nt trans_only_2023_extended.txt fr http://id.mesh.fr/

Convert the official MTMS XML file - OBSOLETE

Download your *.xml translation file at

https://nlmpubs.nlm.nih.gov/projects/mesh/MESH_FILES/.mtms/

Extract translation data from MeSH XML as N-triples dataset using mesh-xml2trx tool

Run the extraction script:
```
  mesh-xml2trx *.xml <TARGET_NS>
```
IMPORTANT: TARGET_NS - target namespace parameter - the custom URI prefix for you translation - it MUST be the same as TARGET_NS used in your mtw.ini config file !

https://github.com/filak/MTW-MeSH/blob/master/flask-app/instance/conf/mtw.ini

ie.
```
  mesh-xml2trx czedesc2018.xml.gz http://mesh.medvik.cz/link/
```

Import the RDF datasets

ALWAYS validate ALL the input files

Run the validation:

No output = dataset is OK
```
 riot --validate *.gz
```
Move the input files into a versioned <IMPORT> directory ie. .../MeSH-data/2023/import/
Load the MeSH datatset(s) into Apache Jena

Stop Fuseki server instance (if running)

Go to your <IMPORT> directory

Run the import:
```
 tdb2_tdbloader --loc %FUSEKI_BASE%/databases/mesh mesh.nt.gz mesh-trx_ ...
```
or if you do not have a translation then just:
```
 tdb2_tdbloader --loc %FUSEKI_BASE%/databases/mesh mesh.nt.gz
```

Create Fuseki search index

Go to your <FUSEKI_DATA_DIR>

 cd %FUSEKI_BASE%

Run the indexation - Jena v4:

 java -cp %FUSEKI_HOME%/fuseki-server.jar jena.textindexer --desc=configuration/mesh.ttl

Run the indexation - Jena v5+:

 java --add-modules jdk.incubator.vector -cp %FUSEKI_HOME%/fuseki-server.jar jena.textindexer --desc=configuration/mesh.ttl

Start Fuseki server instance

Loading data from a backup

Stop MTW services
Stop your Fuseki instance

Go to your <FUSEKI_DATA_DIR> and make sure the <mesh> directories under datatabases and indexes dirs are empty !

Run the import:

 tdb2_tdbloader --loc %FUSEKI_BASE%/databases/mesh %FUSEKI_BASE%/backups/mesh_YYYY-MM-DD_....nq.gz

Create the search index - Jena v4 - run:

 java -cp %FUSEKI_HOME%/fuseki-server.jar jena.textindexer --desc=configuration/mesh.ttl

Create the search index - Jena v5+ - run:

 java --add-modules jdk.incubator.vector -cp %FUSEKI_HOME%/fuseki-server.jar jena.textindexer --desc=configuration/mesh.ttl

Start your Fuseki instance
Start MTW services

Continue to MeSH Annual Updates

Provide feedback

Saved searches

Use saved searches to filter your results more quickly