Skip to content

Larger-than-memory datasets with iris-esmf-regrid and dask #310

@dennissergeev

Description

@dennissergeev

📰 Custom Issue

I was wondering if you have any recommendations on what dask settings I should use if I want to regrid a larger-than-memory dataset using iris-esmf-regrid.

The dataset is from an LFRic C24 run, containing about ten 2D or 3D variables with 1000 time slices loaded from 100 files (i.e. 10 time slices per file). The data are chunked accordingly: 100 chunks for every variable. The total size amounts to about 16G on disk.

I can obviously process this in a file-by-file loop but I hope to load the whole dataset and apply regridding in one go. Currently, my script halts because the regridding step consumes all available RAM. I understand this might be asking for too much specialised help but any advice would be highly appreciated!

Machine specs
                 OS : Linux
             CPU(s) : 8
            Machine : x86_64
       Architecture : 64bit
                RAM : 31.2 GiB
        Environment : Jupyter
        File system : ext4
         GPU Vendor : Intel
       GPU Renderer : Mesa Intel(R) UHD Graphics 620 (KBL GT2)
        GPU Version : 4.6 (Core Profile) Mesa 23.0.4-0ubuntu1~22.04.1

  Python 3.11.5 | packaged by conda-forge | (main, Aug 27 2023, 03:34:09) [GCC 12.3.0]

Metadata

Metadata

Assignees

No one assigned

    Labels

    New: IssueHighlight a new community raised "generic" issue

    Type

    No type

    Projects

    Status

    No status

    Status

    Todo

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions