Skip to content

error in running finalizer: ErrorException("task switch not allowed from inside gc finalizer") #11

@Socob

Description

@Socob

Not sure if this is something new to more recent Julia versions, but I’m getting the following when using SlurmClusterManager and Julia exits:

error in running finalizer: ErrorException("task switch not allowed from inside gc finalizer")
ijl_error at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/rtutils.c:41
ijl_switch at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/task.c:634
try_yieldto at ./task.jl:910
wait at ./task.jl:984
#wait#621 at ./condition.jl:130
wait at ./condition.jl:125 [inlined]
_trywait at ./asyncevent.jl:138
wait at ./asyncevent.jl:155 [inlined]
sleep at ./asyncevent.jl:240
#7 at ~/.julia/packages/SlurmClusterManager/R0zin/src/slurmmanager.jl:93
unknown function (ip: 0x14d06861ecb2)
_jl_invoke at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/gf.c:2758 [inlined]
ijl_apply_generic at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/gf.c:2940
run_finalizer at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/gc.c:417
jl_gc_run_finalizers_in_list at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/gc.c:507
run_finalizers at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/gc.c:553
ijl_atexit_hook at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/init.c:299
jl_repl_entrypoint at /cache/build/default-amdci4-0/julialang/julia-release-1-dot-9/src/jlapi.c:718
main at julia (unknown line)
__libc_start_main at /lib64/libc.so.6 (unknown line)
unknown function (ip: 0x401098)

which seems to be due to the calls to wait/sleep in the finalizer defined in launch():

https://github.com/kleinhenz/SlurmClusterManager.jl/blob/0bfcf079889ce3a7f64b9aef1c1cbe3136bf5e44/src/slurmmanager.jl#L89-L94

See the docstring for finalizer:

help?> finalizer
search: finalizer UndefInitializer finalize

  finalizer(f, x)


  Register a function f(x) to be called when there are no program-accessible references to x, and return x. The type of x must be a mutable
  struct, otherwise the function will throw.

  f must not cause a task switch, which excludes most I/O operations such as println. Using the @async macro (to defer context switching to
  outside of the finalizer) or ccall to directly invoke IO functions in C may be helpful for debugging purposes.

The following avoids the error, but I’m not sure if that still accomplishes the same as the existing code (although I haven’t had any issues so far):

finalizer(manager) do manager
  @async begin
    wait(manager.srun_proc) 
    # need to sleep briefly here to make sure that srun exit is recorded by slurm daemons 
    # TODO find a way to wait on the condition directly instead of just sleeping 
    sleep(manager.srun_post_exit_sleep) 
  end
end 

A workaround which doesn’t involve changing the package source code is to call finalize on the SlurmManager at the end of the program.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions