-
Notifications
You must be signed in to change notification settings - Fork 33
Description
Description
In many scientific workflows, it is common to preprocess NetCDF files using external command-line tools like the NetCDF Operators (NCO), specifically ncks
for subsetting.
A subtle but critical issue arises when these tools process files containing floating-point data. Operations like subsetting can cause a promotion/demotion cycle (e.g., float
-> double
-> float
), which can introduce tiny precision errors. As a result, the numerical value of the data points corresponding to the fill value may no longer be bit-for-bit identical to the _FillValue
attribute stored in the metadata.
When NCDatasets.jl loads such a file, the automatic conversion from _FillValue
to missing
fails because it relies on an exact equality (I guess) check (==
). This leads to a confusing situation where the loaded array has the type Array{Union{Missing, T}}
but contains no missing
values, even though it is full of what should be considered fill values.
Proposed Solution
Instead of using strict equality (==
), the library could use an approximate comparison (isapprox
).
This would correctly identify and mask fill values even when minor precision discrepancies exist. To maintain transparency, a Logging
warning could be emitted the first time this approximate match is triggered for a variable, informing the user that a precision mismatch was detected and handled automatically.
This change would significantly improve the user experience, as the library would "just work" as expected without requiring users to debug subtle floating-point issues and implement manual workarounds.
Thank you for considering this suggestion and for your work on this great package! :)