-
Notifications
You must be signed in to change notification settings - Fork 8
Open
Description
Not sure if this is something new to more recent Julia versions, but I’m getting the following when using SlurmClusterManager and Julia exits:
error in running finalizer: ErrorException("task switch not allowed from inside gc finalizer")
ijl_error at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/rtutils.c:41
ijl_switch at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/task.c:634
try_yieldto at ./task.jl:910
wait at ./task.jl:984
#wait#621 at ./condition.jl:130
wait at ./condition.jl:125 [inlined]
_trywait at ./asyncevent.jl:138
wait at ./asyncevent.jl:155 [inlined]
sleep at ./asyncevent.jl:240
#7 at ~/.julia/packages/SlurmClusterManager/R0zin/src/slurmmanager.jl:93
unknown function (ip: 0x14d06861ecb2)
_jl_invoke at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/gf.c:2758 [inlined]
ijl_apply_generic at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/gf.c:2940
run_finalizer at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/gc.c:417
jl_gc_run_finalizers_in_list at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/gc.c:507
run_finalizers at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/gc.c:553
ijl_atexit_hook at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/init.c:299
jl_repl_entrypoint at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/jlapi.c:718
main at julia (unknown line)
__libc_start_main at /lib64/libc.so.6 (unknown line)
unknown function (ip: 0x401098)
which seems to be due to the calls to wait
/sleep
in the finalizer defined in launch()
:
See the docstring for finalizer
:
help?> finalizer
search: finalizer UndefInitializer finalize
finalizer(f, x)
Register a function f(x) to be called when there are no program-accessible references to x, and return x. The type of x must be a mutable
struct, otherwise the function will throw.
f must not cause a task switch, which excludes most I/O operations such as println. Using the @async macro (to defer context switching to
outside of the finalizer) or ccall to directly invoke IO functions in C may be helpful for debugging purposes.
The following avoids the error, but I’m not sure if that still accomplishes the same as the existing code (although I haven’t had any issues so far):
finalizer(manager) do manager
@async begin
wait(manager.srun_proc)
# need to sleep briefly here to make sure that srun exit is recorded by slurm daemons
# TODO find a way to wait on the condition directly instead of just sleeping
sleep(manager.srun_post_exit_sleep)
end
end
A workaround which doesn’t involve changing the package source code is to call finalize
on the SlurmManager
at the end of the program.
john-waczak, jpsamaroo, henry2004y and BachoSeven
Metadata
Metadata
Assignees
Labels
No labels