Minor fixes in the model. Jupyter notebook for semantics (WIP)

AD4GD · Mar 12, 2024 · bd10202 · bd10202
1 parent 5382c2a
commit bd10202
Show file tree

Hide file tree

Showing 3 changed files with 232 additions and 3 deletions.
diff --git a/assets/ad4gd-model.uml b/assets/ad4gd-model.uml
@@ -28,12 +28,12 @@ abstract "ad4gd:ProcedureStep" {
 }
 
 "sosa:Sensor" <|-- "ad4gd:Sensor" : "rdfs:subClassOf"
-"ad4gd:Sensor" -> "ad4gd:SensorManufacturer" : "skos:broader"
+"ad4gd:Sensor" -> "ad4gd:SensorManufacturer" : "dct:creator"
 "ad4gd:Sensor" --> "sosa:ObservedProperty" : "sosa:observes"
 "ad4gd:Sensor" --> "ad4gd:PropertySpecialization" : "sosa:observes"
 "ad4gd:PropertySpecialization" --> "ad4gd:ProcedureStep" : "sosa:implements"
 "ad4gd:PropertySpecialization" --> "sosa:ObservedProperty" : "skos:broader"
-"ad4gd:PropertySpecialization" --> "qudt:Unit" : "qudt:unit"
+"ad4gd:PropertySpecialization" --> "qudt:Unit" : "qudt:hasUnit"
 "ad4gd:ProcedureStep" --> "sosa:Procedure" : "skos:broader"
 "sosa:ObservedProperty" -> "sosa:ObservedProperty" : owl:sameAs
 @enduml
diff --git a/jupyter/Semantic examples.ipynb b/jupyter/Semantic examples.ipynb
@@ -0,0 +1,226 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "8853f768-c12c-43a8-aee5-77b1a332ef9d",
+   "metadata": {},
+   "source": [
+    "# Prerequisites\n",
+    "\n",
+    "We will start by importing some modules, and defining the RDF prefixes and some other common variables that will be used throughout the notebook."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "966210fd-ded1-4c4b-b887-97a4cd631719",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from rdflib import Graph\n",
+    "\n",
+    "# SPARQL-compatible format so that we can use it both for data and queries\n",
+    "prefixes = '''\n",
+    "# Common vocabularies\n",
+    "prefix qb:   <http://purl.org/linked-data/cube#>\n",
+    "prefix qudt: <http://qudt.org/schema/qudt/>\n",
+    "prefix unit: <http://qudt.org/vocab/unit/>\n",
+    "prefix dcat: <http://www.w3.org/ns/dcat#>\n",
+    "prefix dct:  <http://purl.org/dc/terms/>\n",
+    "prefix owl:  <http://www.w3.org/2002/07/owl#>\n",
+    "\n",
+    "# Some observable property vocabularies\n",
+    "prefix cf: <http://purl.oclc.org/NET/ssnx/cf/cf-property#>\n",
+    "prefix eea: <https://www.eea.europa.eu/help/glossary/eea-glossary/>\n",
+    "prefix wmo: <https://space.oscar.wmo.int/variables/view/>\n",
+    "\n",
+    "# AD4GD prefixes\n",
+    "prefix ad4gd-prop: <https://w3id.org/ad4gd/properties/>\n",
+    "\n",
+    "# Shorthand for defining resources in this document\n",
+    "prefix : <http://example.com/c/>\n",
+    "'''\n",
+    "\n",
+    "# OGC Hosted SPARQL endpoint\n",
+    "sparql_endpoint = 'https://defs-dev.opengis.net/fuseki-hosted/query'"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c2a24aa6-ee41-4fa4-9568-422f1a63532f",
+   "metadata": {},
+   "source": [
+    "# Querying a DCAT catalog\n",
+    "\n",
+    "The following example shows how a DCAT catalog can be queried to find all datasets that observe a given property.\n",
+    "\n",
+    "Let us start by building the catalog itself in Turtle format with 3 datasets:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "c7d48cff-92b7-4e45-8c50-b5bc1312fabf",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Loaded 31 triples\n"
+     ]
+    }
+   ],
+   "source": [
+    "catalog_ttl = prefixes + '''\n",
+    ":catalog a dcat:Catalog ;\n",
+    "  dct:title \"My catalog\" ;\n",
+    "  dcat:dataset :dataset1, :dataset2, :dataset3 ;\n",
+    ".\n",
+    "\n",
+    ":dataset1 a dcat:Dataset ;\n",
+    "  dct:title \"Dataset 1 about PM10\" ;\n",
+    "  dcat:distribution [\n",
+    "    dcat:downloadURL <http://example.com/downloads/d11.csv> ;\n",
+    "    dct:format <https://www.iana.org/assignments/media-types/text/csv> ;\n",
+    "  ] ;\n",
+    "  qb:structure [\n",
+    "    qb:component [\n",
+    "      qb:measure eea:pm10 ;\n",
+    "      qudt:hasUnit unit:MicroGM-PER-M3 ;\n",
+    "    ]\n",
+    "  ] ;\n",
+    ".\n",
+    "\n",
+    ":dataset2 a dcat:Dataset ;\n",
+    "  dct:title \"Dataset 2 about NO2\" ;\n",
+    "  dcat:distribution [\n",
+    "    dcat:downloadURL <http://example.com/export/d2.xlsx> ;\n",
+    "    dct:format <https://www.iana.org/assignments/media-types/application/vnd.ms-excel> ;\n",
+    "  ] ;\n",
+    "  qb:structure [\n",
+    "    qb:component [\n",
+    "      qb:measure wmo:no2 ;\n",
+    "      qudt:hasUnit unit:MicroGM-PER-M3 ;\n",
+    "    ]\n",
+    "  ] ;\n",
+    ".\n",
+    "\n",
+    ":dataset3 a dcat:Dataset ;\n",
+    "  dct:title \"Dataset 3 about PM10, but different\" ;\n",
+    "  dcat:distribution [\n",
+    "    dcat:downloadURL <http://example.net/files/d33.csv> ;\n",
+    "    dct:format <https://www.iana.org/assignments/media-types/text/csv> ;\n",
+    "  ] ;\n",
+    "  qb:structure [\n",
+    "    qb:component [\n",
+    "      qb:measure cf:mass_fraction_of_pm10_ambient_aerosol_in_air ;\n",
+    "    ]\n",
+    "  ] ;\n",
+    ".\n",
+    "'''\n",
+    "catalog = Graph().parse(data=catalog_ttl, format='ttl')\n",
+    "print('Loaded', len(catalog), 'triples')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7e6e9ece-26f4-4e40-8cdc-bd0a77eb5194",
+   "metadata": {},
+   "source": [
+    "Each dataset describes its structural (`qb:structure`) components (`qb:component`), in this case their measured properties; additionally, two of the datasets also include metadata about the units employed.\n",
+    "\n",
+    "Next, we will try to find which datasets have data about PM10. We will use `https://w3id.org/ad4gd/properties/pm10` (or `ad4gd-prop:pm10` if using the prefixes declared above), the AD4GD observable property defined to that effect:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "id": "832ece44-8d1a-486f-a604-9285899d8454",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      " - Dataset 3 about PM10, but different (http://example.com/c/dataset3), measuring http://purl.oclc.org/NET/ssnx/cf/cf-property#mass_fraction_of_pm10_ambient_aerosol_in_air\n",
+      " - Dataset 1 about PM10 (http://example.com/c/dataset1), measuring https://www.eea.europa.eu/help/glossary/eea-glossary/pm10 in http://qudt.org/vocab/unit/MicroGM-PER-M3\n"
+     ]
+    }
+   ],
+   "source": [
+    "query = prefixes + '''\n",
+    "SELECT DISTINCT ?dataset ?title ?property ?unit WHERE {\n",
+    "  {\n",
+    "    # This subquery is resolved first, and it retrieves the aliases for PM10, both direct and inverse\n",
+    "    SELECT DISTINCT ?propertyAlias WHERE {\n",
+    "      SERVICE <@SPARQL_ENDPOINT@> {\n",
+    "        { ad4gd-prop:pm10 owl:sameAs ?propertyAlias }\n",
+    "        UNION\n",
+    "        { ?propertyAlias owl:sameAs ad4gd-prop:pm10 }\n",
+    "      }\n",
+    "    }\n",
+    "  }\n",
+    "  \n",
+    "  ?dataset a dcat:Dataset;                                # Find a Dataset\n",
+    "    dct:title ?title ;                                    # with a title\n",
+    "    qb:structure/qb:component ?structureComponent ;       # and with a structure component\n",
+    "  . \n",
+    "  ?structureComponent qb:measure ?property .              # that has a measure with a property\n",
+    "  FILTER (?property IN (ad4gd-prop:pm10, ?propertyAlias)) # and the property is PM10 or one of its aliases\n",
+    "  OPTIONAL { ?structureComponent qudt:hasUnit ?unit }     # we also retrieve the unit, if any\n",
+    "}\n",
+    "'''.replace('@SPARQL_ENDPOINT@', sparql_endpoint)\n",
+    "\n",
+    "# Uncomment this to see the query with line numbers\n",
+    "# print('\\n'.join(f\"{i: 3}: {l}\" for i, l in enumerate(query.split('\\n'))))\n",
+    "\n",
+    "pm10_bindings = catalog.query(query).bindings\n",
+    "print(' -', '\\n - '.join(f\"{b['title']} ({b['dataset']}), measuring {b['property']}\"\n",
+    "                         f\"{' in ' + str(b['unit']) if b.get('unit') else ''}\"\n",
+    "                         for b in pm10_bindings))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "48d419c4-82bb-46e5-b2f9-0260fd231014",
+   "metadata": {},
+   "source": [
+    "This is just one of the many ways to query our catalog; for example, a subquery is used here, but two separate queries (one for the property aliases and another for the datasets) could have been run instead.\n",
+    "\n",
+    "Additionally, the bottom part of the query could be re-run against SPARQL endpoints for different catalog sources, merging the results together in one single graph."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "914ef88f-ffcf-407f-ab18-fcfe7c525eab",
+   "metadata": {},
+   "source": [
+    "# Adding sensor metadata to observations\n",
+    "\n",
+    "TBD"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.11.6"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/sensor.community/sensors-uplift.yml b/sensor.community/sensors-uplift.yml
@@ -17,6 +17,9 @@ transform:
           "name": .manufacturer,
           "inScheme": "https://w3id.org/ad4gd/sensors" 
         },
+        "skos:broader": {
+          "@id": ("manuf:" + (.manufacturer | to_iri))
+        },
         "observed_properties": (.observed_properties | map(
           if type == "object" then {
             "broader": (.name | if contains(":") then . else $PROPERTIES.[.] // empty end),
@@ -79,7 +82,7 @@ context:
       '@context':
         '@base': https://w3id.org/ad4gd/properties/
     manufacturer:
-      '@id': skos:broader
+      '@id': dct:creator
       '@type': '@id'
     name: skos:prefLabel
     description: dct:description