Skip to content

Using csv2rdf4lod automation without csv2rdf4lod

timrdf edited this page May 31, 2012 · 11 revisions

While csv2rdf4lod is the Java converter that transforms tabular data into RDF according to enhancement parameters described using the conversion vocabulary, csv2rdf4lod-automation is the set of shell script utilities that setup the directory structure, invoke csv2rdf4lod, and publish the results to /var/www.

csv2rdf4lod-automation can be used while replacing csv2rdf4lod with our own tabular converter. You might want to do this if your conversion is so out of whack that csv2rdf4lod's "RDFS-like paradigm" doesn't suit your needs. I've seen this twice in the thousands of datasets that I've helped people convert, and to be honest, I don't think their objectives were well designed.

Anyhoo, we should still be able to give you (most of) the provenance for free.

Your converter must be invokable from the command line. It must have required dependancies installed. The only interface between csv2rdf4lod-automation and the converter is the arguments that it feeds to it, and how they are recognized.

The signature is a bit unwieldy (sorry!):

$csv2rdf $data $prov $sampleN -ep $destDir/$datafile.raw.params.ttl $overrideBaseURI $dumpExtensions \
   -w $destDir/$datafile.raw.sample.ttl -id $converterJarMD5 2>&1 | tee -a $CSV2RDF4LOD_LOG

The important bits:

  • Printing to stderr will be captured to a log.
  • $data is the input tablular file.
  • The thing after the -w is the output file; if you don't find one, then dump your output to stdout.
  • -ep is the parameters to your converter. Change how it behaves based on the contents of this file.
mkdir source/SSS/DDD/lib/my.jar

Make the following exist:

edu.rpi.tw.eScience.WaterQualityPortal.oboe.OBOEAgent source/my.csv -w automatic/my.csv.e1.ttl

and have that (and all of its dependencies) on your CLASSPATH envvar.

then

cd source/SSS/DDD/version/VVV/
./convert*.sh
Clone this wiki locally