Skip to content

Panic when workflow-controller restart when using namespace parallelism #14669

@shuangkun

Description

@shuangkun

Pre-requisites

  • I have double-checked my configuration
  • I have tested with the :latest image tag (i.e. quay.io/argoproj/workflow-controller:latest) and can confirm the issue still exists on :latest. If not, I have explained why, in detail, in my description below.
  • I have searched existing issues and could not find a match for this bug
  • I'd like to contribute the fix myself (see contributing guide)

What happened? What did you expect to happen?

E0717 10:45:15.553759       1 runtime.go:79] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
goroutine 1446 [running]:
k8s.io/apimachinery/pkg/util/runtime.logPanic({0x2139180?, 0x3ad2920})
  /go/pkg/mod/k8s.io/apimachinery@v0.24.3/pkg/util/runtime/runtime.go:75 +0x85
k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x3ad0670, 0x1, 0xc003f731f0?})
  /go/pkg/mod/k8s.io/apimachinery@v0.24.3/pkg/util/runtime/runtime.go:49 +0x6b
panic({0x2139180?, 0x3ad2920?})
  /usr/local/go/src/runtime/panic.go:914 +0x21f
github.com/argoproj/argo-workflows/v3/workflow/sync.(*priorityQueue).remove(...)
  /go/src/github.com/argoproj/argo-workflows/workflow/sync/multi_throttler.go:230
github.com/argoproj/argo-workflows/v3/workflow/sync.(*multiThrottler).Remove(0xc000a12a00, {0xc003f7eb40, 0x42})
  /go/src/github.com/argoproj/argo-workflows/workflow/sync/multi_throttler.go:142 +0xed
github.com/argoproj/argo-workflows/v3/workflow/controller.(*WorkflowController).processNextItem.func1()
  /go/src/github.com/argoproj/argo-workflows/workflow/controller/controller.go:881 +0x7f
github.com/argoproj/argo-workflows/v3/workflow/controller.(*WorkflowController).processNextItem(0xc000817180, {0x2853028, 0x3b41180})
  /go/src/github.com/argoproj/argo-workflows/workflow/controller/controller.go:907 +0x8cf
github.com/argoproj/argo-workflows/v3/workflow/controller.(*WorkflowController).runWorker(0xc35ea78ea0?)
  /go/src/github.com/argoproj/argo-workflows/workflow/controller/controller.go:816 +0x88

Version(s)

v3.6

Paste a minimal workflow that reproduces the issue. We must be able to run the workflow; don't enter a workflow that uses private images.

Generally difficult to reproduce. Once when i run a lot of workflow and restart workflow-controller, it occurs.

Logs from the workflow controller

kubectl logs -n argo deploy/workflow-controller | grep ${workflow}

Logs from in your workflow's wait container

kubectl logs -n argo -c wait -l workflows.argoproj.io/workflow=${workflow},workflow.argoproj.io/phase!=Succeeded

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions