-
Notifications
You must be signed in to change notification settings - Fork 33
Open
Description
I am loading multiple NetCDF files aggregated over the time dimension and loading one time slice at a time for use in a simulation.
I am noticing that loading a single time slice from multiple NetCDF files is significantly slower than loading a single time slice from a single NetCDF file. See the MWE below.
First, we create the NetCDF file.
using NCDatasets, DataStructures, BenchmarkTools
ds = NCDataset("test.nc","c", attrib = OrderedDict())
# Dimensions
ds.dim["time"] = 52
ds.dim["lon"] = 360
ds.dim["lat"] = 181
# Declare variables
nctime = defVar(ds,"time", Int32, ("time",), attrib = OrderedDict(
"long_name" => "time",
"standard_name" => "time",
"units" => "seconds since 1970-01-01",
"calendar" => "proleptic_gregorian",
))
nclon = defVar(ds,"lon", Float32, ("lon",), attrib = OrderedDict(
"units" => "degrees_east",
"long_name" => "longitude",
"standard_name" => "longitude",
))
nclat = defVar(ds,"lat", Float32, ("lat",), attrib = OrderedDict(
"units" => "degrees_north",
"long_name" => "latitude",
"standard_name" => "latitude",
))
test_var = defVar(ds,"test", Float32, ("lon", "lat", "time"), attrib = OrderedDict(
"units" => "%",
))
nctime[:] = [i for i in 1.0:52.0]
nclon[:] = [i for i in 0.0:359.0]
nclat[:] = [i for i in -90.0:90.0]
test_var[:,:,:] = zeros(360, 181, 52)
close(ds)
Then, we benchmark loading a single time slice.
using BenchmarkTools
ds = NCDataset("test.nc","r")
@benchmark $ds[$"test"][:,:,1]
close(ds)
ds = NCDataset(["test.nc"],"r", aggdim = "time")
@benchmark $ds[$"test"][:,:,1]
close(ds)
On my machine, I get a median of 59.002 μs
for a single file dataset and a median of 1.407 ms
for a multifile dataset.
Is there any way to speed up loading a time slice from a multifile dataset and get the same performance as loading from a single file dataset? In particular, loading a time slice that belongs in only one of the files.
Metadata
Metadata
Assignees
Labels
No labels