Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better GeoSPARQL conformity #84

Open
patrickbr opened this issue Feb 14, 2024 · 7 comments
Open

Better GeoSPARQL conformity #84

patrickbr opened this issue Feb 14, 2024 · 7 comments

Comments

@patrickbr
Copy link
Member

patrickbr commented Feb 14, 2024

We should be fully conform with the GeoSPARQL standards for types geo:SpatialObject and geo:Feature.

In particular, a geo:Feature must have the following properties: geo:hasGeometry, geo:hasDefaultGeometry, geo:hasCentroid and geo:hasBoundingBox

That is, osm:Nodes, osm:Ways, osm:Relations and osm:Areas should be of type geo:Feature and offer these properties.

As far as I understand it, all of the properties geo:hasGeometry, geo:hasDefaultGeometry, geo:hasCentroid and geo:hasBoundingBox must then point to an object of type geo:SpatialObject. These must implement geo:hasSize, geo:hasMetricSize, geo:hasLength, geo:hasMetricLength, geo:hasPerimeterLength, geo:hasMetricPerimeterLength, geo:hasArea, geo:hasMetricArea, geo:hasVolume and geo:hasMetricVolume.

So far, I don't see any problem with implementing this.

However, AFAIK (@lehmann-4178656ch, @Danysan1, please correct me) , sfIntersects and sfContains should be properties between geo:SpatialObjects. This would mean that we cannot write queries like

SELECT ?osm_id ?hasgeometry WHERE {
  osmrel:1960198 ogc:sfContains ?osm_id .
  ?osm_id geo:hasGeometry/geo:asWKT ?hasgeometry 
}

anymore. They would then look like this:

SELECT ?osm_id ?hasgeometry WHERE {
  osmrel:1960198 geo:hasGeometry ?geoma .
  ?osm_id geo:hasGeometry ?geomb .
  ?geoma ogc:sfContains ?geomb .
  ?geomb geo:hasGeometry/geo:asWKT ?hasgeometry 
}

@hannahbast, @joka921, is that a problem?

See also ad-freiburg/qlever#678 (comment)

@lehmann-4178656ch
Copy link
Member

Can/Shall we replace the current geo:hasGeometry with geo:hasDefaultGeometry as we only provide a single geometry? If both are needed we would have provide the same information with two predicated:

osmObject geo:hasGeometry ourGeoObject .
osmObject geo:hasDefaultGeometry ourGeoObject .

Regarding the properties of geo:SpatialObject object specs allows for these to be implemented but we are not required to add them all and some could never associate any meaningful value, e.g. the area of a way without width, or the volume of a point.

@patrickbr
Copy link
Member Author

Regarding the properties of geo:SpatialObject object specs allows for these to be implemented but we are not required to add them all and some could never associate any meaningful value, e.g. the area of a way without width, or the volume of a point.

I am not so sure - according to RFC 2119, SHALL means an absolute requirement. So afaik we must provide both.

@patrickbr
Copy link
Member Author

@lehmann-4178656ch and I discussed this further.

I think a sane approach would be to omit the properties we cannot fill with any meaningful value to keep the dataset size manageable. For example, it seems extremely redundant to add geo:hasArea properties with a value of 0 to each node and way in the dataset.

In this spirit, I would also not add the geo:hasDefaultGeometry triple. It's just overly redundant.

@Danysan1
Copy link

Danysan1 commented Feb 14, 2024

Looking at the geoSPARQL specification...

  • in 6.4 I read that geo:hasGeometry, geo:hasDefaultGeometry, geo:hasCentroid and geo:hasBoundingBox have geo:Feature as domain and geo:Geometry as range
  • in 6.2.2 I read that geo:Feature rdfs:subClassOf geo:SpatialObject
  • in 6.8.1 I read that geo:Geometry rdfs:subClassOf geo:SpatialObject
  • in 6.3 I read that geo:hasLength, geo:hasArea and all other properties "for associating Spatial Objects with scalar spatial measurements" have geo:SpatialObject as domain and range, NOT geo:Geometry
  • in 7.2 I read that geo:sfContains, geo:sfIntersects and all other properties in the "Simple Features relation family" have geo:SpatialObject as domain and range, NOT geo:Geometry

This is visualized in this diagram at the beginning of section 6 and this other diagram from this paper

So:

a geo:Feature must have the following properties: geo:hasGeometry, geo:hasDefaultGeometry, geo:hasCentroid and geo:hasBoundingBox

osm:Nodes, osm:Ways, osm:Relations and osm:Areas should be of type geo:Feature and offer these properties.

all of the properties geo:hasGeometry, geo:hasDefaultGeometry, geo:hasCentroid and geo:hasBoundingBox must then point to an object of type geo:SpatialObject. These must implement geo:hasSize, geo:hasMetricSize, geo:hasLength, geo:hasMetricLength, geo:hasPerimeterLength, geo:hasMetricPerimeterLength, geo:hasArea, geo:hasMetricArea, geo:hasVolume and geo:hasMetricVolume.

sfIntersects and sfContains should be properties between geo:SpatialObjects

I agree with all of the above

we cannot write queries like

SELECT ?osm_id ?hasgeometry WHERE {
  osmrel:1960198 ogc:sfContains ?osm_id .
  ?osm_id geo:hasGeometry/geo:asWKT ?hasgeometry 
}

anymore. They would then look like this:

SELECT ?osm_id ?hasgeometry WHERE {
  osmrel:1960198 geo:hasGeometry ?geoma .
  ?osm_id geo:hasGeometry ?geomb .
  ?geoma ogc:sfContains ?geomb .
  ?geomb geo:hasGeometry/geo:asWKT ?hasgeometry 
}

I believe this is not the case: given that

  1. geo:Feature is rdfs:subClassOf geo:SpatialObject
  2. these relations have geo:SpatialObject as domain and range

then these relations can also link geo:Features to other geo:Features, so the old syntax is still correct.

If I understand correctly this also means that if these triples hold...

:x geo:hasGeometry :xGeom.
:y geo:hasGeometry :yGeom.
:x geo:sfContains :y

...then also these hold...

:x geo:sfContains :yGeom.
:xGeom geo:sfContains :y.
:xGeom geo:sfContains :yGeom.

This would require doing some inference (combining geo:hasGeometry with the base relation), either materialized in the triples or done dynamically at query-time.

I think a sane approach would be to omit the properties we cannot fill with any meaningful value to keep the dataset size manageable. For example, it seems extremely redundant to add geo:hasArea properties with a value of 0 to each node and way in the dataset.

In this spirit, I would also not add the geo:hasDefaultGeometry triple. It's just overly redundant.

Given what you pointed out about SHALL and that 6.3 reads "Implementations shall allow the properties ... to be used in SPARQL graph patterns" this probably would break the formal full conformity with GeoSPARQL, but still, in my opinion it is an acceptable tradeoff.

@hannahbast
Copy link
Member

Thank you all for this discussion. One way to realize redundant predicates is to just let the SPARQL engine know about 100% equivalent predicates, have the triples in the index for exactly one and then map each equivalent predicate to this one at query time.

The situation is not new, just the scale. For example, each of the 90 M Wikidata items has exactly one rdfs:label triple and a 100% equivalent (and therefore redundant) schema:name triple. We didn't care about this so far, since it's just 90 M additional triples compared to 19 B triples overall. But if these redundant triples blow up the total size of the dataset considerably, we should care.

Similarly, for predicate paths <x>/<y>, where you never need the intermediate node (typically, a blank node), the index builder could just discard the blank node, internally create a simple predicate <x/y>, and then map the path <x>/<y> to <x/y> at query time. if a query asks for the blank node in between at query time, we could either create it on the fly or issue an error message.

@VladimirAlexiev
Copy link

Hi @patrickbr @lehmann-4178656ch @Danysan1 @hannahbast @joka921,
The desire to make your OSM representation GeoSPARQL compliant is highly appreciated!

GeoSPARQL is a voluminous and complex spec.
I copy here two main GeoSPARQL experts @nicholascar @situx to correct what I write below in case I made mistakes.

Alternative Geometries

Currently you have

osmnode:679109323
  geo:hasGeometry osm2rdfgeom:osm_node_679109323 ;
  osm2rdfgeom:convex_hull "..."^^geo:wktLiteral ;
  osm2rdfgeom:envelope "..."^^geo:wktLiteral ;
  osm2rdfgeom:obb "..."^^geo:wktLiteral .
osm2rdfgeom:osm_node_679109323 geo:asWKT "..."^^geo:wktLiteral .

But all these are alternative geometries so I suggest to change it to:

osmnode:679109323
  geo:hasGeometry
    osmnode:679109323/geom, osmnode:679109323/convexHull, osmnode:679109323/boundingBox, osmnode:679109323/orientedBoundingBox;
  geo:hasDefaultGeometry osmnode:679109323/geom;
  geo:hasBoundingBox osmnode:679109323/boundingBox;
.

osmnode:679109323/geom a geo:Geometry; osm2rdf:role "geometry"; geo:asWKT "..."^^geo:wktLiteral.
osmnode:679109323/convexHull a geo:Geometry; osm2rdf:role "convexHull"; geo:asWKT "..."^^geo:wktLiteral.
osmnode:679109323/boundingBox  a geo:Geometry; osm2rdf:role "boundingBox"; geo:asWKT "..."^^geo:wktLiteral.
osmnode:679109323/orientedBoundingBox  a geo:Geometry; osm2rdf:role "orientedBoundingBox"; geo:asWKT "..."^^geo:wktLiteral.

Notes:

  • I suggest to use hierarchical URLs, where geometry URLs of a feature use the feature URL as prefix. (Above I've shown CURIEs, which are not valid prefixed URLs, but you get the idea)
  • You should use hasGeometry for all, hasDefaultGeometry for the main (detailed) geometry, hasBoundingBox for the envelope (I assume by "envelope" you mean the bounding box, right?)
  • You can add geo:hasCentroid if you can compute it, but it's optional.
  • For all, I've added osm2rdf:role to allow the user to distinguish between them.
  • I suggest to simplify your namespace osm2rdfmember to just osm2rdf, so the same predicate can be used here and in "members"
  • OGC discussed the introduction of roles and "qualified geometries" Cater for and define basic Geometry roles opengeospatial/ogc-geosparql#241, Consider describing Qualified Geometries opengeospatial/ogc-geosparql#430 but that is not yet standardized, so you can use your own roles.
  • I think it's enough for roles to be strings, not "things"

Feature class and Relations

Currently you have eg

osmnode:679109323 rdf:type osm:node

Please also add geo:Feature as type.

It's ok to keep the topological relations at the level of Features, eg:

osmrel:3766584 ogc:sfContains osmway:264339544

As you can see in C.2.3.1. All features or geometries overlapping with another feature, the relations apply at both levels of Feature and Geometry, and by keeping them at the level of Feature, you implement only the first (most efficient) branch of the UNION.

Magic Predicates

You have materialized topological relations using an unofficial namespace like this:

@prefix ogc: <http://www.opengis.net/rdf#> .
osmrel:3766584 ogc:sfContains osmway:264339544

Please consider using geo:sfContains (the official namespace). This has pros and cons:

  • For Qlever, which afaik doesn't support GeoSPARQL indexing, this will be ideal since it will allow a user to make standard queries, and have them execute quickly
  • But for repositories that support GeoSPARQL indexing, it would conflict with the standard "magic predicate" geo:sfContains. Eg in GraphDB, that predicate is not consulted in the database, but is passed to the geospatial index to process.

I think you should use the standard predicate geo:sfContains, but put those triples into separate dump files.
That way sem web developers can choose whether to load them to their repo, or let the repo compute the topological relations automatically.

BTW, have you implemented transitivity of sfContains?

(This section applies to all topological relations that you support, not just sfContains)

Measures

It's a good idea to provide measures if you can.

  • I think you should provide these: geo:hasMetricLength, geo:hasMetricPerimeterLength, geo:hasMetricArea
  • Non-metric measures are not very useful since they don't fix the UoM system
  • The other measures are a bit abstract (hasSize) or don't apply (hasVolume)

Measures should be attached to Features not Geometries. Eg the Area of a boundingBox is typically bigger than the area of the detailed geometry, and only the latter is interesting.

@situx
Copy link

situx commented Jun 6, 2024

No additions from my side. I think @VladimirAlexiev explained it very well.
I would also be happy to see the dataset published using the GeoSPARQL vocabulary.
If you find anything you would like to express but cannot express in GeoSPARQL, we are always happy to receive a pull request or an issue in the ogc-geosparql repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants