Skip to content

Out of memory when SAW produces gigantic error message #135

@pennyannn

Description

@pennyannn

When the CI produces this error message "Docker Action run completed with exit code 137", it means the Github runner is running out of memory. There are many possibilities for out of memory errors. There is one that is very bizarre and hard to debug.

When SAW fails, sometimes it produces a gigantic error message (could be up to 10s of gigabytes). When it happens, SAW's memory usage grows fast, which we could do nothing about. But at the same time, in parallel.py, the subprocess.run call will try to capture the whole error message, causing the python process's memory usage to grow fast as well. If one prints using ps aux, one will see something like the following:

2023-12-16T04:31:22.3477406Z root        7300  2.6 52.4 34516632 34502404 ?   S    04:01   0:47 /usr/bin/python3 ./scripts/parallel.py --file ./scripts/x86_64/release_jobs.sh
2023-12-16T04:31:22.3489243Z root        7307 95.6 45.2 1074209060 29793672 ? Rl   04:01  28:24 saw proof/ECDH/verify-ECDH.saw

One can see that the python job is also using an unusual amount of memory.

Fix the Python script parallel.py to discard error message larger than a pre-defined size.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions