Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend the value of <https://www.openstreetmap.org/wiki/Key:wikidata> to become a proper URI #49

Open
l00mi opened this issue Feb 18, 2023 · 6 comments

Comments

@l00mi
Copy link

l00mi commented Feb 18, 2023

The value of <https://www.openstreetmap.org/wiki/Key:wikidata> seems to be simply the Q-Number of Wikidata. In RDF the Q Numbers of Wikidata are represented as follows e.g. https://www.wikidata.org/wiki/Q116819199.

To make it comfortable to connect and query OSM Entities together with Wikidata it would be great to create for such instances the correct NamedNodes instead of Literals.

@l00mi l00mi changed the title Extend the value of the <https://www.openstreetmap.org/wiki/Key:wikidata> to become a proper URI Extend the value of <https://www.openstreetmap.org/wiki/Key:wikidata> to become a proper URI Feb 18, 2023
@l00mi
Copy link
Author

l00mi commented Feb 18, 2023

As work around the following Federated Query works:

PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT * WHERE {
  ?relation <https://www.openstreetmap.org/wiki/Key:place> "country" .
  ?relation <https://www.openstreetmap.org/wiki/Key:name:en> ?name.
  ?relation <https://www.openstreetmap.org/wiki/Key:wikidata> ?wdValue.
  BIND(uri(concat("http://www.wikidata.org/entity/", ?wdValue)) as ?wd)

  SERVICE <https://query.wikidata.org/sparql> {
    ?wd wdt:P31 ?type.
    ?type rdfs:label ?typeName.
    FILTER(lang(?typeName)="en")
    
  }
  
}

@hannahbast
Copy link
Member

There are two predicates in the datasets produced by osm2rdf:

<https://www.openstreetmap.org/wiki/Key:wikidata>, which has an object of the form Q116819199

<https://www.openstreetmap.org/wikidata>, which has an object of the form <https://www.wikidata.org/wiki/Q116819199>

The reasons for the distinction is that the first predicate is how the information is stored in the original data, while the second predicate is more useful. I personally would be in favor of having only one predicate.

@lehmann-4178656ch @patrickbr What do you think?

@l00mi
Copy link
Author

l00mi commented Feb 18, 2023

Thank you for this pointer, this is good to know. To have it only once, but with the Entity URI would make it to be found easier.

@lehmann-4178656ch
Copy link
Member

One of the targets @patrickbr and I formulated when we started the work on osm2rdf was to have access to the raw data/every information provided by the OSM where ever possible.

Collapsing both representations into one would only work for single value (.*:)wikidata entries, if I'm not mistaken.

Transforming every entry without retaining the original would break the goal of retaining all information. We would have to split values and introduce intermediate nodes when lists are provided, e.g.:

osmnode:1080146569 osmkey:brand:wikidata "Q17412635;Q796364;Q36008;Q17412684;Q6686;Q246"

Currently we simply add a single statement for the first entry in the list, which may be not sufficient for lists, but keeps the graph relatively small. This should provide the most important information if the values are ordered accordingly in the OSM:

osmnode:1080146569 osm:brand:wikidata wd:Q17412635 .

We could introduce an entry for every Q-value in the list. This would result in something like the following:

osmnode:1080146569 osm:brand:wikidata wd:Q17412635
osmnode:1080146569 osm:brand:wikidata wd:Q796364
osmnode:1080146569 osm:brand:wikidata wd:Q36008
osmnode:1080146569 osm:brand:wikidata wd:Q17412684
osmnode:1080146569 osm:brand:wikidata wd:Q6686
osmnode:1080146569 osm:brand:wikidata wd:Q246

Additionally, we would need to add intermediate nodes (as mentioned before) to provide the order of the values.
This would increase the graph size and introduce an alternative structure if lists are involved. Something like the following could represent arbitrary wikidata entry list data, this would retain all the information (both representations of each entry and their order) but also increase the overall graph.

osmnode:1080146569 osm2rdf:wikidataListEntry _:0
osmnode:1080146569 osm2rdf:wikidataListEntry _:1
osmnode:1080146569 osm2rdf:wikidataListEntry _:2
osmnode:1080146569 osm2rdf:wikidataListEntry _:3
osmnode:1080146569 osm2rdf:wikidataListEntry _:4
osmnode:1080146569 osm2rdf:wikidataListEntry _:5

_:0 osm2rdf:key osmkey:brand:wikidata
_:0 osm2rdf:pos 1
_:0 osm2rdf:value "Q17412635"
_:0 osm:brand:wikidata wd:Q17412635
_:1 osm2rdf:key osmkey:brand:wikidata
_:1 osm2rdf:pos 2
_:1 osm2rdf:value "Q796364"
_:1 osm:brand:wikidata wd:Q796364
_:2 osm2rdf:key osmkey:brand:wikidata
_:2 osm2rdf:pos 3
_:2 osm2rdf:value "Q36008"
_:2 osm:brand:wikidata wd:Q36008
_:3 osm2rdf:key osmkey:brand:wikidata
_:3 osm2rdf:pos 4
_:3 osm2rdf:value "Q17412684"
_:3 osm:brand:wikidata wd:Q17412684
_:4 osm2rdf:key osmkey:brand:wikidata
_:4 osm2rdf:pos 5
_:4 osm2rdf:value "Q6686"
_:4 osm:brand:wikidata wd:Q6686
_:5 osm2rdf:key osmkey:brand:wikidata
_:5 osm2rdf:pos 6
_:5 osm2rdf:value "Q246"
_:5 osm:brand:wikidata wd:Q246

This opens the question whether or not single entries should always be treated as if they are lists and therefore explicitly state this information. Treating every single value as a list would make the representation uniform but introduce the overhead many entries as having lists in wikidata fields is far less common than having only single values.

I'm open for suggestion which allow us to not lose any information found in the original data without making the original data hard to find. I'll also try to talk to @patrickbr next week about this.

@l00mi
Copy link
Author

l00mi commented Feb 20, 2023

For this specific key, I would argue this is maintaining the original information. You just adapt the format to the medium you convert into.

Regarding the lists, does the order have in some keys actual meaning? If so, you should also consider good'ol https://www.w3.org/TR/rdf-schema/#ch_list.

@patrickbr
Copy link
Member

Thanks for the suggestion, @l00mi ! As @hannahbast said, we are already creating <https://www.openstreetmap.org/wikidata> predicates linking to an URI, albeit not yet for lists of Wikidata IDs (like in the example given by @lehmann-4178656ch). We should not forget, however, that OSM attributes are free strings, and that some users might expect them to be free strings in the RDF dump. Our philosophy so far was to keep these free strings, but add "semantically polished" versions of attribute values where possible.

We are currently discussing whether we should add an option to completely drop the free string representations for OSM attribute values handled like this. Let's keep this issue open until we have arrived at a conclusion :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants