GitHub - ailabitmo/sempubchallenge2014-task1: A solution for the Task 1 of Semantic Publishing Challenge

#How to configure and run the parser

##Required modules: The following Python modules need to installed:

RDFLib (https://github.com/RDFLib/rdflib),
PDFMiner (http://www.unixuser.org/~euske/python/pdfminer/),
Grab (http://grablib.org/),
PyPDF2 (https://github.com/mstamy2/PyPDF2).

##Configuration All configuration settings should be in config.py file which should be created from config.py.example by renaming it.

###Input urls The list of input urls are set as a Python list to input_urls variable.

###DBpedia dataset (with countries and universities) Parser uses DBpedia to extract the names of countries and univeristies, and their URIs in DBpedia.

There are three options:

to use the original dataset. It's by default, nothing should be configured,
to use the OpenLink's mirror, then the sparqlstore['dbpedia_url'] should be changed to http://lod.openlinksw.com/sparql,
to use a local dump, it's prefered option, because it should be much faster and more stable. The sparqlstore['dbpedia_url'] should be set to the local SPARQL Endpoint and the RDF files dumps/dbpedia_country.xml and dumps/dbpedia_universities.xml should be uploaded to it. Look at the wiki to find the steps to generate the DBpedia dumps.

###Run

Once you finished with the configuration you need just to execute the following script:

python CeurWsParser/spider.py

The dataset will be in rdfdb.ttl file.

#Queries

SPARQL queries created for the Task 1 as translation of the human readable queries to SPARQL queries using our data model. The queries are in the wiki.

#Contacts

Maxim Kolchin ([email protected])

Fedor Kozlov ([email protected])

Name		Name	Last commit message	Last commit date
Latest commit History 95 Commits
CeurWsParser		CeurWsParser
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

About

Uh oh!

Releases 1

Packages

Contributors 3

Uh oh!

Languages

License

ailabitmo/sempubchallenge2014-task1

Folders and files

Latest commit

History

Repository files navigation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 3

Uh oh!

Languages

Packages