-
Notifications
You must be signed in to change notification settings - Fork 22
Description
When the CI produces this error message "Docker Action run completed with exit code 137", it means the Github runner is running out of memory. There are many possibilities for out of memory errors. There is one that is very bizarre and hard to debug.
When SAW fails, sometimes it produces a gigantic error message (could be up to 10s of gigabytes). When it happens, SAW's memory usage grows fast, which we could do nothing about. But at the same time, in parallel.py, the subprocess.run
call will try to capture the whole error message, causing the python process's memory usage to grow fast as well. If one prints using ps aux
, one will see something like the following:
2023-12-16T04:31:22.3477406Z root 7300 2.6 52.4 34516632 34502404 ? S 04:01 0:47 /usr/bin/python3 ./scripts/parallel.py --file ./scripts/x86_64/release_jobs.sh
2023-12-16T04:31:22.3489243Z root 7307 95.6 45.2 1074209060 29793672 ? Rl 04:01 28:24 saw proof/ECDH/verify-ECDH.saw
One can see that the python job is also using an unusual amount of memory.
Fix the Python script parallel.py to discard error message larger than a pre-defined size.