Skip to content

Memory usage surge during GC #381

@Besroy

Description

@Besroy

When blob sizes are large, receiving data from the data channel consumes significant memory. If multiple GC tasks are running concurrently, the parallel reading of shard valid blobs by each task can cause memory usage to increase by several gigabytes, which can lead to OOM issue.

This issue is frequently observed on SH due to a low GC threshold. Currently, the problem is mitigated by increasing pod memory, but there is a potential risk of similar issues in production environments. If encountered, we may need to reassess the memory requirements for the SM pod. Just record here now.

Here is a SH example:
memory usage:

Image

gc task:

I have no name!@sm-long-running2-1003-bb5666bbf-wgrq8:/logs/logs/latest$ grep "start process gc task" storage_mgr_log
[12/27/25 06:51:49.348] [storage_mgr] [debug] [1066] [gc_manager.cpp:1161:process_gc_task] [gc_task_id=1] start process gc task for move_from_chunk=56 with priority=1
[12/27/25 06:51:49.348] [storage_mgr] [debug] [1067] [gc_manager.cpp:1161:process_gc_task] [gc_task_id=2] start process gc task for move_from_chunk=54 with priority=1
[12/27/25 06:51:49.348] [storage_mgr] [debug] [1068] [gc_manager.cpp:1161:process_gc_task] [gc_task_id=3] start process gc task for move_from_chunk=53 with priority=1
[12/27/25 06:53:00.249] [storage_mgr] [debug] [1630] [gc_manager.cpp:1161:process_gc_task] [gc_task_id=4] start process gc task for move_from_chunk=405 with priority=0
[12/27/25 06:53:49.348] [storage_mgr] [debug] [1797] [gc_manager.cpp:1161:process_gc_task] [gc_task_id=5] start process gc task for move_from_chunk=349 with priority=1
[12/27/25 06:53:49.348] [storage_mgr] [debug] [1798] [gc_manager.cpp:1161:process_gc_task] [gc_task_id=6] start process gc task for move_from_chunk=89 with priority=1
[12/27/25 06:54:49.348] [storage_mgr] [debug] [2216] [gc_manager.cpp:1161:process_gc_task] [gc_task_id=7] start process gc task for move_from_chunk=411 with priority=1
[12/27/25 06:54:49.348] [storage_mgr] [debug] [2217] [gc_manager.cpp:1161:process_gc_task] [gc_task_id=8] start process gc task for move_from_chunk=397 with priority=1
[12/27/25 06:54:49.348] [storage_mgr] [debug] [2218] [gc_manager.cpp:1161:process_gc_task] [gc_task_id=9] start process gc task for move_from_chunk=381 with priority=1
[12/27/25 06:54:49.348] [storage_mgr] [debug] [1066] [gc_manager.cpp:1161:process_gc_task] [gc_task_id=10] start process gc task for move_from_chunk=91 with priority=1
I have no name!@sm-long-running2-1003-bb5666bbf-wgrq8:/logs/logs/latest$ grep gc_task_id storage_mgr_log  | grep complete
[12/27/25 06:52:42.715] [storage_mgr] [info] [1066] [gc_manager.cpp:1261:process_gc_task] [gc_task_id=1] task for move_from_chunk=56 to move_to_chunk=412 with priority=1 is completed!
[12/27/25 06:52:42.718] [storage_mgr] [info] [1067] [gc_manager.cpp:1261:process_gc_task] [gc_task_id=2] task for move_from_chunk=54 to move_to_chunk=421 with priority=1 is completed!
[12/27/25 06:52:42.718] [storage_mgr] [info] [1068] [gc_manager.cpp:1261:process_gc_task] [gc_task_id=3] task for move_from_chunk=53 to move_to_chunk=420 with priority=1 is completed!
[12/27/25 06:53:17.362] [storage_mgr] [info] [1630] [gc_manager.cpp:1261:process_gc_task] [gc_task_id=4] task for move_from_chunk=405 to move_to_chunk=419 with priority=0 is completed!
[12/27/25 06:54:24.781] [storage_mgr] [info] [1798] [gc_manager.cpp:1261:process_gc_task] [gc_task_id=6] task for move_from_chunk=89 to move_to_chunk=417 with priority=1 is completed!
[12/27/25 06:54:24.782] [storage_mgr] [info] [1797] [gc_manager.cpp:1261:process_gc_task] [gc_task_id=5] task for move_from_chunk=349 to move_to_chunk=418 with priority=1 is completed!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions