-
Notifications
You must be signed in to change notification settings - Fork 270
GSoC 2014 Progress Sergey Skovorodkin
Name: Sergey Skovorodkin
Mentors: Marco Fossati, Michel Dumontier
Github project: dbpedia/wikidata-mapper
Proposal: Automated Wikidata mappings to DBpedia ontology
DBPedia and Wikidata are Linked Open Data projects, and at first sight it seems that they are very similar, but in fact they have absolutely different approaches to deal with structured data. DBPedia extracts data from Wikipedia, while Wikidata is more like Wikipedia for data—its primary source of information are users. Wikipedia itself replaces old infoboxes by Wikidata, so DBPedia could use Wikidata as a data provider and editing interface. But there is a problem of heterogeneity. Both projects use different tools and knowledge, and it leads to usage of different names for the same concepts (terminological heterogeneity). And it’s important to accurately wire up those concepts in ontologies of projects.
The goal of the project is to create a system that automatically maps items and properties of Wikidata to classes and properties of DBPedia. The output of the system are two datasets: one for equivalent properties and another for equivalent classes.
- Extract Wikidata properties
- Extract Wikidata classes
- Filter non-relevant Wikidata Items
- Basic term preprocessing
- Exact label matcher
- Levenshtein distance matcher
- Minimal English Wikidata class list
- Minimal English Wikidata property list
- Store non-English languages data in a separate file
- Minimal dump of DBPedia classes and properties
- Manually validate mappings of properties that have exact label matching
- Manually validate mappings of classes that have exact label matching
I'm building a little tool that's useful for manual validation. It's quite boring and error-prone (I've already fixed several mapping errors with that tool) to compare labels and descriptions, then open two pages (DBPedia and Wikidata) to get more information, switch between them... It would be nice to show all relevant information for two entities on one page.
- Load Wikidata exports to Virtuoso instance - I want to get examples of usages of Wikidata property (many properties don't have enough information on their pages, especially on DBPedia). I use online SPARQL endpoint to get examples for DBPedia.
I'm still validating the mapping of properties (it's a part of making test set). I don't want to lose any knowledge about DBpedia ontology I get in this process so I'm writing a document that gathers any uncertainties I see. I edit some of the mappings right away, but most of the time I have some doubts, so I just collect them in the document and then I want the community to help me to resolve those uncertainties. (I hope it's the hardest part of the project, it takes a lot of time and concentration).