-
Couldn't load subscription status.
- Fork 16
Data Output
Datagen provides a mechanism to implement the serialization of the datasets. It allows the user to define their own formats or to ingest the data directly to a data store. This mechanism is based on three abstract classes, that have to be extended and specified in the configuration file as explained in Compilation_Execution. The abstract classes are the following, and only the initialize, close and the different versions (one for each entity) of the serialize method have to be implemented:
-
StaticSerializer: This class serializes all the entities that are independent of the dataset sizes, that is, tags, tagClasses, organisations and places. -
DynamicPersonSerializer: This class serializes the Persons, Knows, studyAt and workAt relationships. -
DynamicActivitySerializer: This class serializes all the entities related to person activity generation, that is, Forums, Posts, Comments and likes.
Currently, by default we provide the serializer classes for the CsvBasic, CsvCompite, CsvMergeForeign, CsvCompositeMergeForeign and Turtle formats. These are documented in the LDBC SNB benchmark specification document. Some general guidelines:
- Use
CsvBasicfor graph databases that support CSV import. - Use
CsvCompositefor graph databases that support CSV import and composite data structures. - Use
CsvMergeForeignfor relational databases. - Use
CsvCompositeMergeForeignfor relational databases that support composite data structures. - Use
Turtlefor RDF tools and graph-based tools that support it.