Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add inner and outer members of boundary and multipolygon relations #72

Open
1ec5 opened this issue Jan 23, 2024 · 8 comments
Open

Add inner and outer members of boundary and multipolygon relations #72

1ec5 opened this issue Jan 23, 2024 · 8 comments

Comments

@1ec5
Copy link

1ec5 commented Jan 23, 2024

This query shows that no boundary or multipolygon relation in the OSM Planet dataset has osmrel:members that are ways with the role inner or outer. The only members in the dataset are label and admin_centre nodes, subarea relations, and plenty of tagging mistakes. This makes it difficult to perform tasks such as:

  • Comparing the perimeter of a building that has a courtyard to the perimeter (P2547) property on Wikidata
  • Computing the perimeter of a boundary, for example to apply the Poslby–Popper compactness test to the boundary
  • Associating a disputed boundary claim line with a boundary relation
  • Finding murals on walls of buildings that have courtyards

Also, in this OSM discussion, I needed to access the ways that make up a boundary relation in order to determine the total set of ways that would be part of a proposed time zone relation. I had to drop down to Overpass, which has various recursing operators as well as a length() operator.

@1ec5
Copy link
Author

1ec5 commented Jan 23, 2024

Computing the perimeter of a boundary, for example to apply the Poslby–Popper compactness test to the boundary

Another possible way to satisfy this use case would be to add osm2rdf:length perimeter triples to areas.

@hannahbast
Copy link
Member

osm2rdf has an option --add-relation-border-members. It seems that the dumps available on https://osm2rdf.cs.uni-freiburg.de are currently built without that option. I think there was a time when we were concerned about very large numbers of triples. But since we now have over 40 billion triples already for OSM Planet, I don't think adding a few more is a problem.

@lehmann-4178656ch @patrickbr Do you agree?

@hannahbast
Copy link
Member

hannahbast commented Jan 23, 2024

@1ec5 I have set up a SPARQL endpoint for the data from Germany (that was quick to do), where relations now have the mentioned members. Can you please check whether that has all the triples you need: https://qlever.cs.uni-freiburg.de/osm-germany . Note that for that endpoint the geometries are obtained again with geo:hasGeometry (without the geo:asWKT). I didn't do that on purpose, it was accidental, but just so you know.

Here is a query which gives (and shows) all the geometries of all the members of Berlin: https://qlever.cs.uni-freiburg.de/osm-germany/TWlwsr

@1ec5
Copy link
Author

1ec5 commented Jan 23, 2024

Thank you, yes, this query returns the least compact admin_level=8 boundaries in the extract according to the Polsby–Popper test.

@hannahbast
Copy link
Member

hannahbast commented Jan 23, 2024

@1ec5 Thanks for the feedback!

@lehmann-4178656ch @patrickbr The increase in the number of tripels due to --add-relation-border-members is below 1%. So I would just add that option when building the datasets on https://osm2rdf.cs.uni-freiburg.de . Are there more such options that we could meaningfully add, which would make the datasets more complete?

@patrickbr
Copy link
Member

patrickbr commented Jan 23, 2024

@lehmann-4178656ch is already working on a PR to make --add-relation-border-members the default. This also greatly simplifies the code. Another PR will add the object timestamps.

Regarding additional data completeness options: we are currently not outputting the "members" (node IDs) of ways. The reason is that most of these nodes are empty (without any attributes). We could do this, but it would significantly increase the dataset size (essentially, we would add 3 triples for each anchor point of a way geometry: (1) a triple connecting the way to the empty OSM node, (2) a hasGeometry triple connecting the OSM node to a geometry object, and (3) an asWKT triple connecting the geometry to its WKT representation).

Another thing I just thought of: we are also not outputting author information, which could be present in the input .pbf file (it is present in the input files we use for https://osm2rdf.cs.uni-freiburg.de/). I might be interesting to get all objects last authored by user X.

Also, the changeset id (basically an OSM "commit") is currently not dumped.

@1ec5
Copy link
Author

1ec5 commented Jan 23, 2024

These way members and changeset metadata are often used in Overpass queries, but I refrained from asking for them upfront because I assumed they’d be of more interest internally to OSM and OHM than externally. Off the top of my head, one external use case would be finding a given building’s entrances, something geocoders might do to better serve navigation applications. Another would be finding street intersections.

For reference, Sophox includes relation members but omits way members. Sophox also includes the element’s version and last changeset, timestamp, and user. Some of the example queries make use of this functionality.

@hannahbast
Copy link
Member

@1ec5 I am already convinced that these should be in our dataset. Just waiting for feedback from @lehmann-4178656ch and @patrickbr . They already agreed that we should have the information about changeset, timestamp, and user in our dataset. It's just a few billion more triples :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants