Ontario: A Federated SPARQL Query Processor over Semantic Data Lakes
Check the demo
folder for dockerized examples.
chebi-tsv-mapping.ttl
@prefix rr: <http://www.w3.org/ns/r2rml#> .
@prefix rml: <http://semweb.mmlab.be/ns/rml#> .
@prefix ql: <http://semweb.mmlab.be/ns/ql#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix dc: <http://purl.org/dc/elements/1.1/> .
@prefix chebi: <http://www.ebi.ac.uk/chebi/> .
@prefix : <http://tib.de/ontario/mapping#> .
:chebi_compound
rml:logicalSource [
rml:source "compounds.tsv";
rml:referenceFormulation ql:TSV;
rml:iterator "*"
];
rr:subjectMap [
rr:template "http://www.ebi.ac.uk/chebi/{ID}";
rr:class chebi:Compound
];
rr:predicateObjectMap [
rr:predicate chebi:accession;
rr:objectMap [
rml:reference "CHEBI_ACCESSION"
]
];
rr:predicateObjectMap [
rr:predicate rdfs:label;
rr:objectMap [
rml:reference "NAME"
]
].
To generate the RDF Molecule Templates, one should prepare a list of data sources with their mapping files (if any) as follows:
datasources.json
[
{
"name": "ChEBI-TSV",
"ID": "http://iasis.eu/datasource/chebi-tsv",
"url": "/home/user/data/ChEBI-TSV",
"params": {
"spark.driver.cores": "4",
"spark.executor.cores": "4",
"spark.cores.max": "6",
"spark.default.parallelism": "4",
"spark.executor.memory": "6g",
"spark.driver.memory": "12g",
"spark.driver.maxResultSize": "8g",
"spark.python.worker.memory": "10g",
"spark.local.dir": "/tmp"
},
"type": "LOCAL_TSV",
"mappings": ["/home/user/git/Ontario/mappings/ChEBI/chebi-tsv-mapping.ttl"]
}
]
Data Source type
value can be one of the following:
SPARQL_Endpoint
MySQL
LOCAL_CSV
LOCAL_TSV
LOCAL_JSON
LOCAL_XML
HADOOP_CSV
HADOOP_TSV
HADOOP_JSON
HADOOP_XML
MongoDB
Neo4j
Then run the following:
python3 scripts/create_rdfmts.py -s datasources.json -o config.json
Then the RDF-MTs will be generated either by contacting the data sources or from the RML mappings.
The content of the config.json
file contains the following information:
{
"templates": [ {
"rootType": "http://tib.eu/ontology/chebi/Compound",
"datasources": [
{
"datasource": "http://iasis.eu/datasource/chebi-tsv",
"predicates": [
"http://www.w3.org/1999/02/22-rdf-syntax-ns#type",
"http://www.w3.org/2000/01/rdf-schema#label",
"http://tib.eu/ontology/chebi/accession"
]
}
],
"predicates": [
{
"predicate": "http://www.w3.org/1999/02/22-rdf-syntax-ns#type",
"range": []
},
{
"predicate": "http://tib.eu/ontology/chebi/accession",
"range": []
},
{
"predicate": "http://www.w3.org/2000/01/rdf-schema#label",
"range": []
}
]
}
],
"datasources": [
{
"name": "ChEBI-TSV",
"ID": "http://iasis.eu/datasource/chebi-tsv",
"url": "/home/user/data/ChEBI-TSV",
"params": {
"spark.driver.cores": "4",
"spark.executor.cores": "4",
"spark.cores.max": "6",
"spark.default.parallelism": "4",
"spark.executor.memory": "6g",
"spark.driver.memory": "12g",
"spark.driver.maxResultSize": "8g",
"spark.python.worker.memory": "10g",
"spark.local.dir": "/tmp"
},
"type": "LOCAL_TSV",
"mappings": ["/home/user/git/Ontario/mappings/ChEBI/chebi-tsv-mapping.ttl"]
}
]
}
Ontario has been developed in python (3.x) and depends on some python packages to communicate with different databases and services. To install the required packages run:
pip3 install -r requirements.txt
Install Ontario:
python3 setup.py install
To run queries:
./runExperiment.py -q path/to/sparqlquery.txt -c path/to/config.json -p False
If you want to just see the plans, set -p True
.
To run multiple queries in a folder:
./runOntarioExp.sh /path/to/queriefolder/ path/to/config.json outputname.tsv errorlog.txt False
If you want to just see the plans, set the last argument True
docker build -t ontario:0.5 .
You can use pre-built image of kemele/ontario:0.5
Currently Ontario as a SPARQL endpoint is supported on the docker version kemele/ontario:0.5
.
import urllib.parse as urlparse
import requests
import json
params = urlparse.urlencode({'query': 'SELECT DISTINCT ?Concept WHERE{?s a ?Concept} LIMIT 5'})
resp = requests.get('http://localhost:5001/sparql', params=params)
if resp.status_code == 200:
result = json.loads(resp.text)
print(result)
Output:
{'execTime': 0.15656578540802002,
'firstResult': 0.15205996036529541,
'totalRows': 5,
'vars': ['Concept'],
'result': [
{'Concept': {'type': 'uri', 'value': 'http://bio2rdf.org/ns/kegg#Drug'}},
{'Concept': {'type': 'uri', 'value': 'http://bio2rdf.org/ns/kegg#Enzyme'}},
{'Concept': {'type': 'uri', 'value': 'http://bio2rdf.org/ns/kegg#Compound'}},
{'Concept': {'type': 'uri', 'value': 'http://bio2rdf.org/ns/kegg#Reaction'}},
{'Concept': {'type': 'uri', 'value': 'http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/drug_interactions'}}
]
}
Kemele M. Endris, Philipp D. Rohde, Maria-Esther Vidal, and Sören Auer. "Ontario: Federated Query Processing against a Semantic Data Lake." DEXA 2019 - Database and Expert Systems Applications. Lecture Notes in Computer Science. Springer, Cham (2019).
This work is licensed under GNU/GPL v2.