Source code of "checking tools", in many programming languages.
Wikidata dumps.
Each country (P17) by its código ISO 3166-1 alfa-2 (P297) and ID de relação OpenStreetMap (P402).
curl -o data/dumps_wd/countries.csv -G 'https://query.wikidata.org/sparql' \
--header "Accept: text/csv" \
--data-urlencode query='
SELECT DISTINCT ?iso2 ?qid ?osm_relid ?itemLabel
WHERE {
?item wdt:P297 _:b0.
BIND(strafter(STR(?item),"http://www.wikidata.org/entity/") as ?qid).
OPTIONAL { ?item wdt:P1448 ?name .}
OPTIONAL { ?item wdt:P297 ?iso2 .}
OPTIONAL { ?item wdt:P402 ?osm_relid .}
SERVICE wikibase:label { bd:serviceParam wikibase:language "en,[AUTO_LANGUAGE]" . }
}
ORDER BY ?iso2
'
In nowdays at Wikidata, the only manner to check items "about country" is checking the statement P17
(country). Example: to check that an Wikidata item is about Brazil we must check ?item wdt:P17 wd:Q155
... But this statement can be in a parent (!) without redunce in the item, so the correct query is ?item (P31|P279)*/P17 wd:Q155
, that is so time-consuming in a Bigdata graph as Wikidata.
To avoid "Bigdata graph traversal limitations" with SparQL, we need to use a hint directive, geting only basic data,
curl -o data/dumps_wd/BR_relIds.csv \
-G 'https://query.wikidata.org/sparql' \
--header "Accept: text/csv" \
--data-urlencode query='
SELECT DISTINCT ?wd_id ?osm_relid
WHERE {
?wd_id (wdt:P31|wdt:P279)*/wdt:P17 wd:Q155 .
?wd_id wdt:P402 ?osm_relid .
hint:Prior hint:runLast true .
}
ORDER BY ?wd_id
' # ~10 seconds, ~5600 items
Now BR_relIds.csv
have the main data to be analysed. To get all other attributes and candidates we can constraint
SELECT DISTINCT ?qid ?osm_relid ?wgs84 ?codIBGE ?itemLabel
WHERE {
?item wdt:P625 _:b0.
?item wdt:P31*/wdt:P279*/wdt:P17 wd:$_QID_COUNTRY_ .
BIND(xsd:integer(strafter(str(?item), "http://www.wikidata.org/entity/Q")) as ?qid)
OPTIONAL { ?item wdt:P1448 ?name. }
OPTIONAL { ?item wdt:P402 ?osm_relid .}
OPTIONAL { ?item wdt:P625 ?wgs84 .}
OPTIONAL { ?item wdt:P1585 ?codIBGE .}
SERVICE wikibase:label { bd:serviceParam wikibase:language "en,[AUTO_LANGUAGE]". }
}
ORDER BY ASC(?qid)
So, using BR as country code the associated QID is Q155, $_QID_COUNTRY_='Q155'
and we obtain:
curl -o data/dumps_wd/BR_items.csv \
-G 'https://query.wikidata.org/sparql' \
--header "Accept: text/csv" \
--data-urlencode query='
SELECT DISTINCT ?qid ?osm_relid ?wgs84 ?codIBGE ?itemLabel
WHERE {
?item wdt:P625 _:b0.
?item wdt:P31*/wdt:P279*/wdt:P17 wd:Q155.
BIND(xsd:integer(strafter(str(?item), "http://www.wikidata.org/entity/Q")) as ?qid)
OPTIONAL { ?item wdt:P1448 ?name. }
OPTIONAL { ?item wdt:P402 ?osm_relid .}
OPTIONAL { ?item wdt:P625 ?wgs84 .}
OPTIONAL { ?item wdt:P1585 ?codIBGE .}
SERVICE wikibase:label { bd:serviceParam wikibase:language "en,[AUTO_LANGUAGE]". }
}
ORDER BY ASC(?qid)
' # 2 minutes, ~32000 items
OpenStreetMap dumps.
Scan with Overpass, by country. Example: php src/OSM_get.php BR
will refresh the data/dumps_osm/BR_elements.csv
file.
After OSM's and Wikidata's dumps prepared, this recipe will:
- match dumps,
- do some checks,
- and refresh lookup files, including the error output.
... All by SQL, see psql
part in the make.