Slowdown when using multiple NetCDF files

I am loading multiple NetCDF files aggregated over the time dimension and loading one time slice at a time for use in a simulation.

I am noticing that loading a single time slice from multiple NetCDF files is significantly slower than loading a single time slice from a single NetCDF file. See the MWE below.


First, we create the NetCDF file.
```
using NCDatasets, DataStructures, BenchmarkTools
ds = NCDataset("test.nc","c", attrib = OrderedDict())
# Dimensions

ds.dim["time"] = 52
ds.dim["lon"] = 360
ds.dim["lat"] = 181

# Declare variables
nctime = defVar(ds,"time", Int32, ("time",), attrib = OrderedDict(
    "long_name"                 => "time",
    "standard_name"             => "time",
    "units"                     => "seconds since 1970-01-01",
    "calendar"                  => "proleptic_gregorian",
))
nclon = defVar(ds,"lon", Float32, ("lon",), attrib = OrderedDict(
    "units"                     => "degrees_east",
    "long_name"                 => "longitude",
    "standard_name"             => "longitude",
))
nclat = defVar(ds,"lat", Float32, ("lat",), attrib = OrderedDict(
    "units"                     => "degrees_north",
    "long_name"                 => "latitude",
    "standard_name"             => "latitude",
))
test_var = defVar(ds,"test", Float32, ("lon", "lat", "time"), attrib = OrderedDict(
    "units"                     => "%",
))
nctime[:] = [i for i in 1.0:52.0]
nclon[:] = [i for i in 0.0:359.0]
nclat[:] = [i for i in -90.0:90.0]
test_var[:,:,:] = zeros(360, 181, 52)

close(ds)
```

Then, we benchmark loading a single time slice.
```
using BenchmarkTools
ds = NCDataset("test.nc","r")
@benchmark $ds[$"test"][:,:,1]
close(ds)
ds = NCDataset(["test.nc"],"r", aggdim = "time")
@benchmark $ds[$"test"][:,:,1]
close(ds)
```

On my machine, I get a median of `59.002 μs` for a single file dataset and a median of `1.407 ms` for a multifile dataset.

Is there any way to speed up loading a time slice from a multifile dataset and get the same performance as loading from a single file dataset? In particular, loading a time slice that belongs in only one of the files.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Slowdown when using multiple NetCDF files #277

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Slowdown when using multiple NetCDF files #277

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions