-
Notifications
You must be signed in to change notification settings - Fork 2
Archival Data Quality
Archiving preserves data for prosterity. The goal of archiving documents is to take a snapshot of the data as they were at the time of archiving. Then, the documents may at a later time be made accessible to a user, either the original data producer, a researcher or anyone interested, who have legitimate access rights to the documents. Ideally, the document should be as similar as possible to the original document, they were copied from, both in content and structure, but technical changes may be performed by the archive in order to ensure persistent accessibility to the document throughout changing technological landscapes. This may result in data loss, but if so it must be reduced to the minimum and it must be documented.
Here follows a list of requirements, which CLISC checks for and performs actions to the archived spreadsheet in order to make sure the spreadsheet meets the necessary data quality level for long-term storage. If archiving is not selected when using CLISC, the list of requirements will not be followed.
Spreadsheets containing zero data have no value for later reuse and hence no value for archiving.
Risk
There's two risks for spreadsheets, either having no sheets or no cells with data. Fortunately, Excel does not allow user to save a spreadsheet without any sheets and a COM exception is also thrown when using Excel Interop.
System.Runtime.InteropServices.COMException: 'The project folder must have at least one visible worksheet.
However, other ways of deleting the sheets without using Excel might exist, and secondly in Excel saving a spreadsheet without any cell values is allowed.
Solution
Check for number of sheets (minimum one) and check for presense of cell value in any sheet in the spreadsheet (minimum one).
Risk
Solution
External relationships include linked OLE objects (they are not embedded) and linked cell values fetched from another spreadsheet. Both external relationships relate to files in your local directory, and which relationship will be broken, if the files or spreadsheet is removed from the local directory.
Risk
Data from the context of external relationships will be lost, if a spreadsheet is archived, which typically means removing the spreadsheet from it's local environment parameters if the archive is pursuing a data migration strategy.
Solution
Remove any external relationships. In the case of linked cell values, keep the actual cell values as the snapshot. In the case of unembedded OLE objects, they should be handled manually and the object should ideally be archived separately and the relationship documented in metadata. Alternative solution is to embed the OLE object.
Microsoft Office Excel provides a worksheet function, RealTimeData (RTD). This function enables you to call a Component Object Model (COM) Automation server to retrieve data in real time.
The RTD function uses the following syntax =RTD(RealTimeServerProgID,ServerName,Topic1,[Topic2], ...)
Risk
When opening an archived spreadsheet many years from now, the server connection might still be available and the values will auto-update. This intervenes with the archival goal of taking snapshots of data at a given time.
Solution
Remove any RTD formula functions in cells in any sheet in a spreadsheet, but keep the actual cell values as the snapshot.