Graph Space Model (WSDM'14)
This is the code for the graph document model as described in:
Michael Schuhmacher and Simone Paolo Ponzetto: Knowledge-based Graph Document Modeling. In Proceedings of WSDM'14, pp. 543-552, ACM, 2014 (paper, presentation)
Note that this code is not really running out of the box. It's more a resource for you to copy&paste some code. Most relevant w.r.t. to the paper are:
- The main experiments method running the LP50 experiments with the Lee Pincombe 50 Documents data (as repored in the paper)
- The Triple Weighting approaches CombIC, jointIC, PMI
- The JGraphT-based, parallel Dijkstral for computing the cheapest Paths
- The Hungarian Method for the approx. Graph Matching
Also third party datasets are not provided, as I cannot redistributed those without permission. Folders where third-party data are need are empty in this repo, but contain a readme.txt.
The different Triple Weighter Classes build upon a full triple counts statistic for DBpedia. These data are already pre-computed and availble from here
- Gibhub:dbpediaweights (for DBpedia 3.9 and DBpedia2014)