Skip to content

Latest commit

 

History

History

src

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 

Source code of "checking tools", in many programming languages.

Preparing dumps_wd

Wikidata dumps.

countries.csv

Each country (P17) by its código ISO 3166-1 alfa-2 (P297) and ID de relação OpenStreetMap (P402).

curl -o data/dumps_wd/countries.csv -G 'https://query.wikidata.org/sparql' \
     --header "Accept: text/csv"  \
     --data-urlencode query='
 SELECT DISTINCT ?iso2 ?qid ?osm_relid ?itemLabel
 WHERE {
  ?item wdt:P297 _:b0.
  BIND(strafter(STR(?item),"http://www.wikidata.org/entity/") as ?qid).
  OPTIONAL { ?item wdt:P1448 ?name .}
  OPTIONAL { ?item wdt:P297 ?iso2 .}
  OPTIONAL { ?item wdt:P402 ?osm_relid .}
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en,[AUTO_LANGUAGE]" . }
 }
 ORDER BY ?iso2
'

country full-list of itens

In nowdays at Wikidata, the only manner to check items "about country" is checking the statement P17 (country). Example: to check that an Wikidata item is about Brazil we must check ?item wdt:P17 wd:Q155... But this statement can be in a parent (!) without redunce in the item, so the correct query is ?item (P31|P279)*/P17 wd:Q155, that is so time-consuming in a Bigdata graph as Wikidata.

To avoid "Bigdata graph traversal limitations" with SparQL, we need to use a hint directive, geting only basic data,

curl -o data/dumps_wd/BR_relIds.csv \
     -G 'https://query.wikidata.org/sparql' \
     --header "Accept: text/csv"  \
     --data-urlencode query='
     SELECT DISTINCT ?wd_id ?osm_relid
     WHERE {
       ?wd_id (wdt:P31|wdt:P279)*/wdt:P17 wd:Q155 .
       ?wd_id wdt:P402 ?osm_relid .
       hint:Prior hint:runLast true .
     }
     ORDER BY ?wd_id
'  # ~10 seconds, ~5600 items

Now BR_relIds.csv have the main data to be analysed. To get all other attributes and candidates we can constraint

SELECT DISTINCT ?qid ?osm_relid ?wgs84 ?codIBGE ?itemLabel
WHERE {
  ?item wdt:P625 _:b0.
  ?item wdt:P31*/wdt:P279*/wdt:P17 wd:$_QID_COUNTRY_ .
  BIND(xsd:integer(strafter(str(?item), "http://www.wikidata.org/entity/Q")) as ?qid)
  OPTIONAL { ?item wdt:P1448 ?name. }
  OPTIONAL { ?item wdt:P402 ?osm_relid .}
  OPTIONAL { ?item wdt:P625 ?wgs84 .}  
  OPTIONAL { ?item wdt:P1585 ?codIBGE .}
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en,[AUTO_LANGUAGE]". }
}
ORDER BY ASC(?qid)

So, using BR as country code the associated QID is Q155, $_QID_COUNTRY_='Q155' and we obtain:

curl -o data/dumps_wd/BR_items.csv \
     -G 'https://query.wikidata.org/sparql' \
     --header "Accept: text/csv"  \
     --data-urlencode query='
     SELECT DISTINCT ?qid ?osm_relid ?wgs84 ?codIBGE ?itemLabel
     WHERE {
       ?item wdt:P625 _:b0.
       ?item wdt:P31*/wdt:P279*/wdt:P17 wd:Q155.
       BIND(xsd:integer(strafter(str(?item), "http://www.wikidata.org/entity/Q")) as ?qid)
       OPTIONAL { ?item wdt:P1448 ?name. }
       OPTIONAL { ?item wdt:P402 ?osm_relid .}
       OPTIONAL { ?item wdt:P625 ?wgs84 .}  
       OPTIONAL { ?item wdt:P1585 ?codIBGE .}
       SERVICE wikibase:label { bd:serviceParam wikibase:language "en,[AUTO_LANGUAGE]". }
     }
     ORDER BY ASC(?qid)
'  # 2 minutes, ~32000 items

Preparing dumps_osm

OpenStreetMap dumps.

Scan with Overpass, by country. Example: php src/OSM_get.php BR will refresh the data/dumps_osm/BR_elements.csv file.

Preparing lookup

After OSM's and Wikidata's dumps prepared, this recipe will:

  1. match dumps,
  2. do some checks,
  3. and refresh lookup files, including the error output.

... All by SQL, see psql part in the make.