Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Substitute OSM keys which are concepts with a proper URI #50

Open
l00mi opened this issue Feb 18, 2023 · 18 comments
Open

Substitute OSM keys which are concepts with a proper URI #50

l00mi opened this issue Feb 18, 2023 · 18 comments

Comments

@l00mi
Copy link

l00mi commented Feb 18, 2023

It would be very use full for indexing and to attach further meaning to have a URI instead the OSM value for keys, e.g. <https://www.openstreetmap.org/wiki/Key:place> "country" .

Either proper concepts in the osm2rdf namespace are created on the fly, or potentially more useful is to substitute the values of this key directly with the fitting Wikidata Concept. Wikidata does list the OSM keys as a Property. The following query can extract the mapping https://w.wiki/6MJT
(They are not always distinct, e.g. https://www.wikidata.org/wiki/Q1007870 and https://www.wikidata.org/wiki/Q207694)

This has the advantage that these tags automatically come with all translations and pictures, or pictograms.

The downside is that it is not clear how to keep ever changing targets Wikidata and OSM are, up-to-date. It might get resolved dynamically at time of conversion.

@hannahbast
Copy link
Member

That's a great suggestion, thanks!

Can you be more specific concerning "This has the advantage that these tags automatically come with all translations and pictures, or pictograms.". How does one obtain the translations, pictures, or pictograms of tags?

@l00mi
Copy link
Author

l00mi commented Feb 18, 2023

Simply because the Wikidata pendants to all keys are better maintained. E.g. https://www.wikidata.org/wiki/Q207694 for (Tag:tourism=gallery).

(And the amount of languages the Tag-Pendant has, might be a good heuristic to choose to which to point.)

@lehmann-4178656ch
Copy link
Member

It would be very use full for indexing and to attach further meaning to have a URI instead the OSM value for keys, e.g. <https://www.openstreetmap.org/wiki/Key:place> "country" .

Just to make sure I understand this part correctly:

We store each property as object osmkey:KeyName value . where osmkey is declared as a prefix (@prefix osmkey: <https://www.openstreetmap.org/wiki/Key:> .) for the value https://www.openstreetmap.org/wiki/Key:. Applying the rules from RDF 1.1 Turtle - 2.4 IRIs:

To write http://www.perceive.net/schemas/relationship/enemyOf using a prefixed name:

  1. Define a prefix label for the vocabulary IRI http://www.perceive.net/schemas/relationship/ as somePrefix
  2. Then write somePrefix:enemyOf which is equivalent to writing <http://www.perceive.net/schemas/relationship/enemyOf>

This makes osmkey:KeyName equivalent to <https://www.openstreetmap.org/wiki/Key:KeyName> with respect to the RDF 1.1 spec.

Either proper concepts in the osm2rdf namespace are created on the fly[...].

What change do you exactly propose? Maybe an example could help me understand this better.

I'm not against optionally adding wikidata pendants to the data. I'm currently against replacing the OSM representation as the initial goal of osm2rdf is to provide as much of the OSM data as possible without the need of additional knowledgebases. Substituting the values would make the dataset unusable without the corresponding wikidata information.

@l00mi
Copy link
Author

l00mi commented Feb 20, 2023

Yes sorry, an example speaks a thousand words, instead (or additionally) as of today:
<https://www.openstreetmap.org/node/1504546320> <https://www.openstreetmap.org/wiki/Key:place> "country".
To use concepts, e.g. from Wikidata.

<https://www.openstreetmap.org/node/1504546320> <https://www.openstreetmap.org/wiki/Key:place> <https://www.wikidata.org/wiki/Q6256>

It goes a bit in the same story as mentioned in #49, representing the OSM Datamodel in LD vs. adapting the model to the new medium, with the goal to be highly queryable in the LD world. (Definitely its also possible to provide both, but for sure you can guess in which camp I am overall.)

@LorenzBuehmann
Copy link
Contributor

@lehmann-4178656ch If I understand correctly, the idea is to make use of semantic information available in Wikidata as there are some concepts mapped to OSM keys or OSM tags. You could then make use of those mappings and "merge" it into the OSM2RDF converted dataset. For example, for each OSM entity with tag power=plant you could add an rdf:type triple using the mapped Wikidata concept http://www.wikidata.org/entity/Q159719 - clearly, the crucial point here is to decide on the semantic property you would use to "link" the OSM entity to the Wikidata concept. rdf:type might not be the appropriate one for all OSM - Wikidata mappings.

In a research project, I quite recently did that for parts of the domain of interest. For example, I extracted a subset of OSM planet for "government" buildings, then used OSM2RDF to convert it to RDF - in a next step, I wanted to align it with Wikidata concepts/entities directly - this can either be done by
a) getting OSM tags from Wikidata

SELECT ?item ?itemLabel ?tag
WHERE 
{
  ?item wdt:P1282 ?tag. 
  filter(contains(?tag, "government"))
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". } 
}

b) from the OSM Wiki which nowadays has so called "data items" (basically the same Mediawiki backend as Wikidata) via some API:

  • Key:government would be the data item and it has a property P12 aka "Wikidata concept" which somehow assigns it to a Wikidata concept. The semantics is still a bit vague from my point of view, and I would say "it depends" ...

Long story short, simply adding a triple osmnode:123 rdf:type wd:123 . doesn't really make sense for my data, given that I basically extracted buildings and saying "building X is a social services" is clearly not correct, instead I have to say "building X is a building in which a social services government organization resides" (if that makes sense at all). Indeed for some Wikidata concepts we can make use of the hierarchy and maybe check for superclasses if those map to buildings or at least places/locations. But for others we need some "is related to Wikidata" property.

I don't think the whole things would be useful to put into OSM2RDF, it's more something on top of it and besides the simple baseline to just fetch the ~4000 OSM tag - WD concept mappings and add an RDF triple to each converted OSM entity with one of those tags - that's nothing more than a SPARQL Update statement to run in a post-processing step.
But, it's way more to do here when we want to make it nice and add more structured semantics.

Sorry for the long post.

@l00mi
Copy link
Author

l00mi commented Feb 20, 2023

Not sure if I would really go and try to find the more intricate semantics of the predicates which could be used. The simple going away from a literal e.g. ('country') to an URI (independent of if it shall be Wikidata or some internal URI), will allow to attach meaning to it.

As with the example above, if for '''osm: key:place''' there is a Wikidata Entry. The tag has suddenly Multilingual labels. Which can be used to search for it, but also helps showing the "properties" of an Entry.

But there are definitively many open questions to this.

@LorenzBuehmann
Copy link
Contributor

LorenzBuehmann commented Feb 20, 2023

Ok, but then we can just run this SPARQL Update statement, no?

PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX osmkey: <https://www.openstreetmap.org/wiki/Key:>

DELETE {
  ?entity ?osmkey ?val .
}
INSERT {
  ?entity ?osmkey ?item .
}
WHERE {
  {
    SELECT ?item ?val ?osmkey {
      SERVICE <https://query.wikidata.org/sparql> {
        ?item wdt:P1282 ?tag.
      }
      BIND(REPLACE(?tag, "^Tag:(.*)=.*$", "$1") AS ?key)
      BIND(REPLACE(?tag, "^Tag:.*=(.*)$", "$1") AS ?val)
      BIND(URI(CONCAT(STR(osmkey:), ?key)) AS ?osmkey)
    }
  }
  ?entity ?osmkey ?val .
}

works for me at least. Indeed might be expensive for OSM planet if the triple store puts it into an transaction.

@patrickbr
Copy link
Member

I fully agree with @LorenzBuehmann on this issue, especially this part:

I don't think the whole things would be useful to put into OSM2RDF, it's more something on top of it and besides the simple baseline to just fetch the ~4000 OSM tag - WD concept mappings and add an RDF triple to each converted OSM entity with one of those tags

For me, it boils down to the fact that the OSM/WikiData concept mappings are nowhere present in the OSM data itself. Since we specifically provide a converter from OSM to RDF data, any feature that would require additional input (e.g. the external OSM/WD mappings) would be out of scope of this project.

@l00mi
Copy link
Author

l00mi commented Feb 20, 2023

Hm, I should probably not have mentioned Wikidata here. The more important part is to go from strings to URIs for concepts, to be able to extend and link it.

@pukkamustard
Copy link

I like the idea of using more URIs for objects.

Since we specifically provide a converter from OSM to RDF data, any feature that would require additional input (e.g. the external OSM/WD mappings) would be out of scope of this project.

This also makes a lot of sense to me.

I'd be very interested in exploring what such an external mapping would look like and how it could be maintained. Some previous work includes the mapping used by the LinkedGeoData project. Although most OSM keys seem to be mapped to custom defined URIs (e.g. http://linkedgeodata.org/ontology/Country) instead of existing ones from things like Wikidata.

@sfkeller
Copy link

sfkeller commented Apr 14, 2023

Dear all - also referring to #49

I'd like to point you to the fact that the OSM wiki already has its own "OSM wikidata" instance, which contains OSM wikidata items! I'd suggest using that.

Almost every tag description page on the OSM wiki has an OSM wikidata item; sometimes even for keys, like "addr:" for address. Just look for "Data Items data object" at the bottom of the toolbar of an OSM wiki page, e.g. here for a tree: https://wiki.openstreetmap.org/wiki/Tag:natural=tree .

And be aware that OSM has an "open world assumption" that allows many concepts to be associated with a single OSM object. So a given "building" can have multiple tags representing multiple different views.

Also, be careful about thinking that an OSM (wikidata) concept like a "tree" can be mapped 1:1 to a wikidata concept. For example, consider a "monument" (tag historic=monument), where the OSM wikidata item text (https://wiki.openstreetmap.org/wiki/Item:Q4839 ) says: "A memorial object, which is especially large (one can go inside, walk on or through it) or very tall (...), built to remember, show respect to a person or group of people or to commemorate an event.". Whereas wikidata.org says in item https://www.wikidata.org/wiki/Q4989906 ... it's an "imposing structure created to commemorate a person or event, or used for that purpose". These definitions are not identical and will rarely be.

So on the one hand, the fact that OSM concepts have their own OSM wikidata item is a direct solution to having a proper URI. On the other hand, this shows that you have a classic semantic integration problem here, where you have inter-schema relationships between OSM wikidata items and wikidata.org items where two concepts are either "equal, disjoint, intersect, or include". This could be solved in fact with an external schema mapping service.

@LorenzBuehmann
Copy link
Contributor

That is correct, and also what I'm using currently. Unfortunately, the data isn't available as a dump and the Sophox SPARQL endpoint does only contain a part of those data items, see the Github issue. If anybody here is willing to use that endpoint, please keep that in mind.
In the end, I used plenty of API requests and converted the JSON to RDF.

@1ec5
Copy link

1ec5 commented Jan 17, 2024

That is correct, and also what I'm using currently. Unfortunately, the data isn't available as a dump and the Sophox SPARQL endpoint does only contain a part of those data items, see the Github issue. If anybody here is willing to use that endpoint, please keep that in mind.

Data items from OSM Wikibase are available as a TTL dump at https://wiki.openstreetmap.org/dump/ (wikibase-rdf.ttl.gz).

Sophox/sophox#31 appears to still be an issue, at least in that particular case, but I don’t know if the root cause is an incomplete dump or something else downstream.

@hannahbast
Copy link
Member

@1ec5 Thanks for this update. I am not sure I understand the dataset though. For example, what is the significance of a prefix like

@prefix wd: <//wiki.openstreetmap.org/entity/>

and then what is the purpose of a triple like

<https://wiki.openstreetmap.org/wiki/Node> schema:about wd:Q3

I find it particularly confusing that prefix names from Wikidata are reused here (in the Wikidata dump, wd: stands for <http://www.wikidata.org/entity/>) and also IDs starting with Q (in the Wikidata dump, Q3 stands for "life".

What do the others think?

@1ec5
Copy link

1ec5 commented Jan 18, 2024

what is the purpose of a triple like

<https://wiki.openstreetmap.org/wiki/Node> schema:about wd:Q3

The OSM Wiki has data items about more than just keys, tags, and relation types. In principle, it could have an item about any page in the wiki’s main namespace. Many of these pages describe OSM concepts, software packages, or geographic regions that have local mapping communities. These pages are vastly outnumbered by tagging pages, which have titles beginning with pseudonamespaces such as “Key:”, “Tag:”, “Relation:”, and “Role:”. The corresponding data items are instances of subclasses of OpenStreetMap concepts or OpenHistoricalMap concepts. (OHM shares the OSM Wiki. The tagging pages are all subpages of “OpenHistoricalMap”, but the data items are differentiated only by their classes.)

For example, what is the significance of a prefix like

@prefix wd: <//wiki.openstreetmap.org/entity/>

This resolves a QID to a data item in OSM Wikibase.

I find it particularly confusing that prefix names from Wikidata are reused here (in the Wikidata dump, wd: stands for <http://www.wikidata.org/entity/>) and also IDs starting with Q (in the Wikidata dump, Q3 stands for "life".

Yes, unfortunately the Wikibase developers declined to allow installations to customize any of the alphabetic prefixes like Q and P. They suggest to rely on prefixes for this distinction.

@1ec5
Copy link

1ec5 commented Jan 18, 2024

I think @nyurik set up the TTL dump and would be able to provide more insight based on his experience writing a different osm2rdf in Rust.

@nyurik
Copy link

nyurik commented Jan 18, 2024

Rust osm2rdf is extremely fast, but sadly it does not (yet) support streaming updates. The issue was that there is currently no simple way to figure out which files (first daily, then hourly, then minutly) to download and process. Once someone writes a simple code that, given the latest timestamp of an edit, produces a sequence of filenames, I can easily adjust that code to actually produce the SPARQL INSERT statements for the updates. In the mean time, there is https://github.com/Sophox/sophox/tree/main/osm2rdf - the original python code that does the same thing but it takes a day to convert OSM dump into TTLs.

@nyurik
Copy link

nyurik commented Jan 18, 2024

Let me know if there are any specific questions to help with the data model

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants