Skip to content

[FEATURE][SPARK] Support cancel async thread of handle blockEvent and rpc when writer is killed #1264

@summaryzb

Description

@summaryzb

Code of Conduct

Search before asking

  • I have searched in the issues and found no similar issues.

Describe the feature

When task is killed for stage cancel, another task attempt succeed or some other reasons, The AddBlockEvent handling and sendShuffleData still work.
Although needCancelRequest may cancel some work, but the AddBlockEvent in the blocking queue of threadPool still holds the shuffleblockdata, and so as to the rpc request that are already called but waiting for repsonse.

That will cause 3 problems:

  1. We freeAll memory onece the task is killed, but the shuffleBlockData hold by the async thread still occupy memory
  2. Many useless runnable related to the kille task are still working or wait to be executed
  3. CurrentlycheckBlockSendResult can not be interrupted, when the killed task caused by speculation is the last one of the shuffle map stage, it will block the next reduce stage scheduling

Motivation

No response

Describe the solution

  1. Cancel all the runnable that are wait to be executed or blocked in waiting for rpc callback
  2. Interrupt checkBlockSendResult immediately

Additional context

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions