-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Minor fixes in the model. Jupyter notebook for semantics (WIP)
- Loading branch information
Showing
3 changed files
with
232 additions
and
3 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,226 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"id": "8853f768-c12c-43a8-aee5-77b1a332ef9d", | ||
"metadata": {}, | ||
"source": [ | ||
"# Prerequisites\n", | ||
"\n", | ||
"We will start by importing some modules, and defining the RDF prefixes and some other common variables that will be used throughout the notebook." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 1, | ||
"id": "966210fd-ded1-4c4b-b887-97a4cd631719", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"from rdflib import Graph\n", | ||
"\n", | ||
"# SPARQL-compatible format so that we can use it both for data and queries\n", | ||
"prefixes = '''\n", | ||
"# Common vocabularies\n", | ||
"prefix qb: <http://purl.org/linked-data/cube#>\n", | ||
"prefix qudt: <http://qudt.org/schema/qudt/>\n", | ||
"prefix unit: <http://qudt.org/vocab/unit/>\n", | ||
"prefix dcat: <http://www.w3.org/ns/dcat#>\n", | ||
"prefix dct: <http://purl.org/dc/terms/>\n", | ||
"prefix owl: <http://www.w3.org/2002/07/owl#>\n", | ||
"\n", | ||
"# Some observable property vocabularies\n", | ||
"prefix cf: <http://purl.oclc.org/NET/ssnx/cf/cf-property#>\n", | ||
"prefix eea: <https://www.eea.europa.eu/help/glossary/eea-glossary/>\n", | ||
"prefix wmo: <https://space.oscar.wmo.int/variables/view/>\n", | ||
"\n", | ||
"# AD4GD prefixes\n", | ||
"prefix ad4gd-prop: <https://w3id.org/ad4gd/properties/>\n", | ||
"\n", | ||
"# Shorthand for defining resources in this document\n", | ||
"prefix : <http://example.com/c/>\n", | ||
"'''\n", | ||
"\n", | ||
"# OGC Hosted SPARQL endpoint\n", | ||
"sparql_endpoint = 'https://defs-dev.opengis.net/fuseki-hosted/query'" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "c2a24aa6-ee41-4fa4-9568-422f1a63532f", | ||
"metadata": {}, | ||
"source": [ | ||
"# Querying a DCAT catalog\n", | ||
"\n", | ||
"The following example shows how a DCAT catalog can be queried to find all datasets that observe a given property.\n", | ||
"\n", | ||
"Let us start by building the catalog itself in Turtle format with 3 datasets:" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 2, | ||
"id": "c7d48cff-92b7-4e45-8c50-b5bc1312fabf", | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"name": "stdout", | ||
"output_type": "stream", | ||
"text": [ | ||
"Loaded 31 triples\n" | ||
] | ||
} | ||
], | ||
"source": [ | ||
"catalog_ttl = prefixes + '''\n", | ||
":catalog a dcat:Catalog ;\n", | ||
" dct:title \"My catalog\" ;\n", | ||
" dcat:dataset :dataset1, :dataset2, :dataset3 ;\n", | ||
".\n", | ||
"\n", | ||
":dataset1 a dcat:Dataset ;\n", | ||
" dct:title \"Dataset 1 about PM10\" ;\n", | ||
" dcat:distribution [\n", | ||
" dcat:downloadURL <http://example.com/downloads/d11.csv> ;\n", | ||
" dct:format <https://www.iana.org/assignments/media-types/text/csv> ;\n", | ||
" ] ;\n", | ||
" qb:structure [\n", | ||
" qb:component [\n", | ||
" qb:measure eea:pm10 ;\n", | ||
" qudt:hasUnit unit:MicroGM-PER-M3 ;\n", | ||
" ]\n", | ||
" ] ;\n", | ||
".\n", | ||
"\n", | ||
":dataset2 a dcat:Dataset ;\n", | ||
" dct:title \"Dataset 2 about NO2\" ;\n", | ||
" dcat:distribution [\n", | ||
" dcat:downloadURL <http://example.com/export/d2.xlsx> ;\n", | ||
" dct:format <https://www.iana.org/assignments/media-types/application/vnd.ms-excel> ;\n", | ||
" ] ;\n", | ||
" qb:structure [\n", | ||
" qb:component [\n", | ||
" qb:measure wmo:no2 ;\n", | ||
" qudt:hasUnit unit:MicroGM-PER-M3 ;\n", | ||
" ]\n", | ||
" ] ;\n", | ||
".\n", | ||
"\n", | ||
":dataset3 a dcat:Dataset ;\n", | ||
" dct:title \"Dataset 3 about PM10, but different\" ;\n", | ||
" dcat:distribution [\n", | ||
" dcat:downloadURL <http://example.net/files/d33.csv> ;\n", | ||
" dct:format <https://www.iana.org/assignments/media-types/text/csv> ;\n", | ||
" ] ;\n", | ||
" qb:structure [\n", | ||
" qb:component [\n", | ||
" qb:measure cf:mass_fraction_of_pm10_ambient_aerosol_in_air ;\n", | ||
" ]\n", | ||
" ] ;\n", | ||
".\n", | ||
"'''\n", | ||
"catalog = Graph().parse(data=catalog_ttl, format='ttl')\n", | ||
"print('Loaded', len(catalog), 'triples')" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "7e6e9ece-26f4-4e40-8cdc-bd0a77eb5194", | ||
"metadata": {}, | ||
"source": [ | ||
"Each dataset describes its structural (`qb:structure`) components (`qb:component`), in this case their measured properties; additionally, two of the datasets also include metadata about the units employed.\n", | ||
"\n", | ||
"Next, we will try to find which datasets have data about PM10. We will use `https://w3id.org/ad4gd/properties/pm10` (or `ad4gd-prop:pm10` if using the prefixes declared above), the AD4GD observable property defined to that effect:" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 3, | ||
"id": "832ece44-8d1a-486f-a604-9285899d8454", | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"name": "stdout", | ||
"output_type": "stream", | ||
"text": [ | ||
" - Dataset 3 about PM10, but different (http://example.com/c/dataset3), measuring http://purl.oclc.org/NET/ssnx/cf/cf-property#mass_fraction_of_pm10_ambient_aerosol_in_air\n", | ||
" - Dataset 1 about PM10 (http://example.com/c/dataset1), measuring https://www.eea.europa.eu/help/glossary/eea-glossary/pm10 in http://qudt.org/vocab/unit/MicroGM-PER-M3\n" | ||
] | ||
} | ||
], | ||
"source": [ | ||
"query = prefixes + '''\n", | ||
"SELECT DISTINCT ?dataset ?title ?property ?unit WHERE {\n", | ||
" {\n", | ||
" # This subquery is resolved first, and it retrieves the aliases for PM10, both direct and inverse\n", | ||
" SELECT DISTINCT ?propertyAlias WHERE {\n", | ||
" SERVICE <@SPARQL_ENDPOINT@> {\n", | ||
" { ad4gd-prop:pm10 owl:sameAs ?propertyAlias }\n", | ||
" UNION\n", | ||
" { ?propertyAlias owl:sameAs ad4gd-prop:pm10 }\n", | ||
" }\n", | ||
" }\n", | ||
" }\n", | ||
" \n", | ||
" ?dataset a dcat:Dataset; # Find a Dataset\n", | ||
" dct:title ?title ; # with a title\n", | ||
" qb:structure/qb:component ?structureComponent ; # and with a structure component\n", | ||
" . \n", | ||
" ?structureComponent qb:measure ?property . # that has a measure with a property\n", | ||
" FILTER (?property IN (ad4gd-prop:pm10, ?propertyAlias)) # and the property is PM10 or one of its aliases\n", | ||
" OPTIONAL { ?structureComponent qudt:hasUnit ?unit } # we also retrieve the unit, if any\n", | ||
"}\n", | ||
"'''.replace('@SPARQL_ENDPOINT@', sparql_endpoint)\n", | ||
"\n", | ||
"# Uncomment this to see the query with line numbers\n", | ||
"# print('\\n'.join(f\"{i: 3}: {l}\" for i, l in enumerate(query.split('\\n'))))\n", | ||
"\n", | ||
"pm10_bindings = catalog.query(query).bindings\n", | ||
"print(' -', '\\n - '.join(f\"{b['title']} ({b['dataset']}), measuring {b['property']}\"\n", | ||
" f\"{' in ' + str(b['unit']) if b.get('unit') else ''}\"\n", | ||
" for b in pm10_bindings))" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "48d419c4-82bb-46e5-b2f9-0260fd231014", | ||
"metadata": {}, | ||
"source": [ | ||
"This is just one of the many ways to query our catalog; for example, a subquery is used here, but two separate queries (one for the property aliases and another for the datasets) could have been run instead.\n", | ||
"\n", | ||
"Additionally, the bottom part of the query could be re-run against SPARQL endpoints for different catalog sources, merging the results together in one single graph." | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "914ef88f-ffcf-407f-ab18-fcfe7c525eab", | ||
"metadata": {}, | ||
"source": [ | ||
"# Adding sensor metadata to observations\n", | ||
"\n", | ||
"TBD" | ||
] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "Python 3 (ipykernel)", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.11.6" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 5 | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters