Skip to content

Data Output

Gabor Szarnyas edited this page May 30, 2021 · 1 revision

Data output

Datagen provides a mechanism to implement the serialization of the datasets. It allows the user to define their own formats or to ingest the data directly to a data store. This mechanism is based on three abstract classes, that have to be extended and specified in the configuration file as explained in Compilation_Execution. The abstract classes are the following, and only the initialize, close and the different versions (one for each entity) of the serialize method have to be implemented:

  • StaticSerializer: This class serializes all the entities that are independent of the dataset sizes, that is, tags, tagClasses, organisations and places.
  • DynamicPersonSerializer: This class serializes the Persons, Knows, studyAt and workAt relationships.
  • DynamicActivitySerializer: This class serializes all the entities related to person activity generation, that is, Forums, Posts, Comments and likes.

Currently, by default we provide the serializer classes for the CsvBasic, CsvCompite, CsvMergeForeign, CsvCompositeMergeForeign and Turtle formats. These are documented in the LDBC SNB benchmark specification document. Some general guidelines:

  • Use CsvBasic for graph databases that support CSV import.
  • Use CsvComposite for graph databases that support CSV import and composite data structures.
  • Use CsvMergeForeign for relational databases.
  • Use CsvCompositeMergeForeign for relational databases that support composite data structures.
  • Use Turtle for RDF tools and graph-based tools that support it.
Clone this wiki locally