Skip to content

Features

AtesComp edited this page Apr 3, 2022 · 9 revisions

Features, Changes, and Enhancements

User Inferface:

  • Cleaned and refactored UI elements (but it still looks mostly like the original RDF extension)
  • Resizable dialogs
  • Uses a template system that is exportable / importable (like OntoRefine) for use between different (but similar data structure) projects
    • The template is stored as a JSON formatted structure
    • The structure, while similar to OntoRefine, uniformly normalizes key names and substructures
    • The same structure is used within the native OpenRefine data store for project and change management
  • Transform Tab:
    • RDF Node Editor: Added Prefix selection to the editor with Prefix and LocalPart management throughout the code
      • Preview Expression Editor:
        • Two new GREL functions:
        • toIRIString() - transforms and properly validates a string as an IRI component (replaces urlify())
        • toStrippedLiteral() - end trims a string with all known Unicode whitespace and non-breaking space characters
    • RDF Property Editor: Initializes with the existing property in the textbox instead of blank text
    • Added universal Expand (>) and Collapse (v) control on all nodes and properties
    • Added universal delete (x) control on all nodes, properties, and types
  • Preview Tab:
    • All changes made in the Transform tab are reserved until the user switches to the Preview tab
    • (FUTURE) Editable sample record / row count for preview

Code:

  • JavaScript code has been updated to use "classified" coding
  • Loops use iterators whenever possible
  • RDF Export capabilities have been expanded to all known RDF4J formats
  • Properly recognize the Row verses Record parameters and processing (row and record visitors)
  • (FUTURE) Process inner record groupings as sub-records
  • Properly parse IRIs for valid structure, prefix and local part, absolute and relative, using the base IRI as needed
  • Properly process an IRI's Condensed IRI Expression (CIRIE, a.k.a., Prefix + LocalPart) for output / export
  • Reserve flushing of scaled statements buffers to speed exports (user definable--see "RDFTransform.exportLimit" below)
  • The "Namespaces" and "PredefinedVocabs" support files are processed using general whitespace separation (not strictly tab delimited)
  • The code differentiates between "namespace" (Prefix and IRI) versus "prefix" use
  • General cleanup and verbose commenting throughout the code

Preferences:

Three preferences were added to manage server output (see OpenRefine preferences)

  • "RDFTransform.verbose" preference aids with process feedback and debugging
    • A general "verbose" preference is rcognized as a default (HINT: OpenRefine might use it as a base preference)
    • 0 == no verbosity and unknown, uncaught errors (stack traces, of course)
    • 1 == basic functional information and all unknown, caught errors
    • 2 == additional info and warnings on well-known issues: functional exits, permissibly missing data, etc
    • 3 == detailed info on functional minutiae and warnings on missing, but desired, data
    • 4 == controlled error catching stack traces, RDF preview statements, and other highly anal minutiae
    • A missing verbose preference defaults to 0
  • "RDFTransform.exportLimit" preference limits the statement buffer and optimizes output
    • The statement buffer (i.e., an internal memory RDF repository) stores statements created from the data
    • The old system created one statement in the buffer, then flushed the buffer to disk--very inefficient
    • The new system holds many statement before before flushing to disk.
    • This buffer can become large if the data is large and produces many statements, so it is somewhat optimized:
      • Given a default statement size of 100 bytes, the default buffer is limited to 1024 * 1024 * 1024 / 100 = 1GiB / 100 = 10737418 statements
      • The 100 byte statement size is likely large as the average statement size is likely smaller
      • Regardless, this keeps memory usage to about 1GiB or less and a user can set the preference to optimize for a given memory footprint and data size
    • Then, the buffered statements optimize the creation and flush processes to speed the disk write
    • (FUTURE) An enhancement may examine the project data size and system memory to determine an optimize buffer size and allocations
  • "RDFTransform.debug" preference aids debugging
    • Controls the output of specifically marked "DEBUG" messages
    • Includes many verbose output messages as well

NOTES:

To streamline RDF Transform, the RDF reconcile functionality has been removed from this project. The reconcile code is intended to be recreated as a separate project.

Clone this wiki locally