Skip to content

Commit

Permalink
Minor fixes in the model. Jupyter notebook for semantics (WIP)
Browse files Browse the repository at this point in the history
  • Loading branch information
avillar committed Mar 12, 2024
1 parent 5382c2a commit bd10202
Show file tree
Hide file tree
Showing 3 changed files with 232 additions and 3 deletions.
4 changes: 2 additions & 2 deletions assets/ad4gd-model.uml
Original file line number Diff line number Diff line change
Expand Up @@ -28,12 +28,12 @@ abstract "ad4gd:ProcedureStep" {
}

"sosa:Sensor" <|-- "ad4gd:Sensor" : "rdfs:subClassOf"
"ad4gd:Sensor" -> "ad4gd:SensorManufacturer" : "skos:broader"
"ad4gd:Sensor" -> "ad4gd:SensorManufacturer" : "dct:creator"
"ad4gd:Sensor" --> "sosa:ObservedProperty" : "sosa:observes"
"ad4gd:Sensor" --> "ad4gd:PropertySpecialization" : "sosa:observes"
"ad4gd:PropertySpecialization" --> "ad4gd:ProcedureStep" : "sosa:implements"
"ad4gd:PropertySpecialization" --> "sosa:ObservedProperty" : "skos:broader"
"ad4gd:PropertySpecialization" --> "qudt:Unit" : "qudt:unit"
"ad4gd:PropertySpecialization" --> "qudt:Unit" : "qudt:hasUnit"
"ad4gd:ProcedureStep" --> "sosa:Procedure" : "skos:broader"
"sosa:ObservedProperty" -> "sosa:ObservedProperty" : owl:sameAs
@enduml
226 changes: 226 additions & 0 deletions jupyter/Semantic examples.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,226 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "8853f768-c12c-43a8-aee5-77b1a332ef9d",
"metadata": {},
"source": [
"# Prerequisites\n",
"\n",
"We will start by importing some modules, and defining the RDF prefixes and some other common variables that will be used throughout the notebook."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "966210fd-ded1-4c4b-b887-97a4cd631719",
"metadata": {},
"outputs": [],
"source": [
"from rdflib import Graph\n",
"\n",
"# SPARQL-compatible format so that we can use it both for data and queries\n",
"prefixes = '''\n",
"# Common vocabularies\n",
"prefix qb: <http://purl.org/linked-data/cube#>\n",
"prefix qudt: <http://qudt.org/schema/qudt/>\n",
"prefix unit: <http://qudt.org/vocab/unit/>\n",
"prefix dcat: <http://www.w3.org/ns/dcat#>\n",
"prefix dct: <http://purl.org/dc/terms/>\n",
"prefix owl: <http://www.w3.org/2002/07/owl#>\n",
"\n",
"# Some observable property vocabularies\n",
"prefix cf: <http://purl.oclc.org/NET/ssnx/cf/cf-property#>\n",
"prefix eea: <https://www.eea.europa.eu/help/glossary/eea-glossary/>\n",
"prefix wmo: <https://space.oscar.wmo.int/variables/view/>\n",
"\n",
"# AD4GD prefixes\n",
"prefix ad4gd-prop: <https://w3id.org/ad4gd/properties/>\n",
"\n",
"# Shorthand for defining resources in this document\n",
"prefix : <http://example.com/c/>\n",
"'''\n",
"\n",
"# OGC Hosted SPARQL endpoint\n",
"sparql_endpoint = 'https://defs-dev.opengis.net/fuseki-hosted/query'"
]
},
{
"cell_type": "markdown",
"id": "c2a24aa6-ee41-4fa4-9568-422f1a63532f",
"metadata": {},
"source": [
"# Querying a DCAT catalog\n",
"\n",
"The following example shows how a DCAT catalog can be queried to find all datasets that observe a given property.\n",
"\n",
"Let us start by building the catalog itself in Turtle format with 3 datasets:"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "c7d48cff-92b7-4e45-8c50-b5bc1312fabf",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Loaded 31 triples\n"
]
}
],
"source": [
"catalog_ttl = prefixes + '''\n",
":catalog a dcat:Catalog ;\n",
" dct:title \"My catalog\" ;\n",
" dcat:dataset :dataset1, :dataset2, :dataset3 ;\n",
".\n",
"\n",
":dataset1 a dcat:Dataset ;\n",
" dct:title \"Dataset 1 about PM10\" ;\n",
" dcat:distribution [\n",
" dcat:downloadURL <http://example.com/downloads/d11.csv> ;\n",
" dct:format <https://www.iana.org/assignments/media-types/text/csv> ;\n",
" ] ;\n",
" qb:structure [\n",
" qb:component [\n",
" qb:measure eea:pm10 ;\n",
" qudt:hasUnit unit:MicroGM-PER-M3 ;\n",
" ]\n",
" ] ;\n",
".\n",
"\n",
":dataset2 a dcat:Dataset ;\n",
" dct:title \"Dataset 2 about NO2\" ;\n",
" dcat:distribution [\n",
" dcat:downloadURL <http://example.com/export/d2.xlsx> ;\n",
" dct:format <https://www.iana.org/assignments/media-types/application/vnd.ms-excel> ;\n",
" ] ;\n",
" qb:structure [\n",
" qb:component [\n",
" qb:measure wmo:no2 ;\n",
" qudt:hasUnit unit:MicroGM-PER-M3 ;\n",
" ]\n",
" ] ;\n",
".\n",
"\n",
":dataset3 a dcat:Dataset ;\n",
" dct:title \"Dataset 3 about PM10, but different\" ;\n",
" dcat:distribution [\n",
" dcat:downloadURL <http://example.net/files/d33.csv> ;\n",
" dct:format <https://www.iana.org/assignments/media-types/text/csv> ;\n",
" ] ;\n",
" qb:structure [\n",
" qb:component [\n",
" qb:measure cf:mass_fraction_of_pm10_ambient_aerosol_in_air ;\n",
" ]\n",
" ] ;\n",
".\n",
"'''\n",
"catalog = Graph().parse(data=catalog_ttl, format='ttl')\n",
"print('Loaded', len(catalog), 'triples')"
]
},
{
"cell_type": "markdown",
"id": "7e6e9ece-26f4-4e40-8cdc-bd0a77eb5194",
"metadata": {},
"source": [
"Each dataset describes its structural (`qb:structure`) components (`qb:component`), in this case their measured properties; additionally, two of the datasets also include metadata about the units employed.\n",
"\n",
"Next, we will try to find which datasets have data about PM10. We will use `https://w3id.org/ad4gd/properties/pm10` (or `ad4gd-prop:pm10` if using the prefixes declared above), the AD4GD observable property defined to that effect:"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "832ece44-8d1a-486f-a604-9285899d8454",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" - Dataset 3 about PM10, but different (http://example.com/c/dataset3), measuring http://purl.oclc.org/NET/ssnx/cf/cf-property#mass_fraction_of_pm10_ambient_aerosol_in_air\n",
" - Dataset 1 about PM10 (http://example.com/c/dataset1), measuring https://www.eea.europa.eu/help/glossary/eea-glossary/pm10 in http://qudt.org/vocab/unit/MicroGM-PER-M3\n"
]
}
],
"source": [
"query = prefixes + '''\n",
"SELECT DISTINCT ?dataset ?title ?property ?unit WHERE {\n",
" {\n",
" # This subquery is resolved first, and it retrieves the aliases for PM10, both direct and inverse\n",
" SELECT DISTINCT ?propertyAlias WHERE {\n",
" SERVICE <@SPARQL_ENDPOINT@> {\n",
" { ad4gd-prop:pm10 owl:sameAs ?propertyAlias }\n",
" UNION\n",
" { ?propertyAlias owl:sameAs ad4gd-prop:pm10 }\n",
" }\n",
" }\n",
" }\n",
" \n",
" ?dataset a dcat:Dataset; # Find a Dataset\n",
" dct:title ?title ; # with a title\n",
" qb:structure/qb:component ?structureComponent ; # and with a structure component\n",
" . \n",
" ?structureComponent qb:measure ?property . # that has a measure with a property\n",
" FILTER (?property IN (ad4gd-prop:pm10, ?propertyAlias)) # and the property is PM10 or one of its aliases\n",
" OPTIONAL { ?structureComponent qudt:hasUnit ?unit } # we also retrieve the unit, if any\n",
"}\n",
"'''.replace('@SPARQL_ENDPOINT@', sparql_endpoint)\n",
"\n",
"# Uncomment this to see the query with line numbers\n",
"# print('\\n'.join(f\"{i: 3}: {l}\" for i, l in enumerate(query.split('\\n'))))\n",
"\n",
"pm10_bindings = catalog.query(query).bindings\n",
"print(' -', '\\n - '.join(f\"{b['title']} ({b['dataset']}), measuring {b['property']}\"\n",
" f\"{' in ' + str(b['unit']) if b.get('unit') else ''}\"\n",
" for b in pm10_bindings))"
]
},
{
"cell_type": "markdown",
"id": "48d419c4-82bb-46e5-b2f9-0260fd231014",
"metadata": {},
"source": [
"This is just one of the many ways to query our catalog; for example, a subquery is used here, but two separate queries (one for the property aliases and another for the datasets) could have been run instead.\n",
"\n",
"Additionally, the bottom part of the query could be re-run against SPARQL endpoints for different catalog sources, merging the results together in one single graph."
]
},
{
"cell_type": "markdown",
"id": "914ef88f-ffcf-407f-ab18-fcfe7c525eab",
"metadata": {},
"source": [
"# Adding sensor metadata to observations\n",
"\n",
"TBD"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.6"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
5 changes: 4 additions & 1 deletion sensor.community/sensors-uplift.yml
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,9 @@ transform:
"name": .manufacturer,
"inScheme": "https://w3id.org/ad4gd/sensors"
},
"skos:broader": {
"@id": ("manuf:" + (.manufacturer | to_iri))
},
"observed_properties": (.observed_properties | map(
if type == "object" then {
"broader": (.name | if contains(":") then . else $PROPERTIES.[.] // empty end),
Expand Down Expand Up @@ -79,7 +82,7 @@ context:
'@context':
'@base': https://w3id.org/ad4gd/properties/
manufacturer:
'@id': skos:broader
'@id': dct:creator
'@type': '@id'
name: skos:prefLabel
description: dct:description
Expand Down

0 comments on commit bd10202

Please sign in to comment.