[CELEBORN-2062] Introduce splitNum to optimize shuffle write #3363

littlexyw · 2025-07-11T09:13:57Z

What changes were proposed in this pull request?

Introduce splitNum to optimize shuffle write.

Why are the changes needed?

Currently, map tasks can only write data to one partition split at a time, which will cause a significant performance regression in the case of partition skew. Therefore, the splitNum configuration is introduced to support the generation of multiple partition splits while revive and assign these partition splits to different map tasks to write, so as to improve the writing concurrency.For example, if splitNum is 2 and num mappers is 8, then it turns to

These leaf nodes represent the latestPartitionLocations. Only leaf nodes can be written.

When the splitStart of a partition split equals splitEnd, the splitNum becomes 1. Additionally, a maxWriteParallelism is introduced to control the maximum concurrency. By setting the initial partition split's splitEnd to maxWriteParallelism - 1, the maximum number of leaf nodes can be controlled to be maxWriteParallelism. If maxWriteParallelism is not configured, the actual maximum number of leaf nodes will equal the number of mappers.

When a map task requests to revive a PartitionLocation, the driver first checks if this partition split is already undergoing revival. If true, it records the request. If not, the driver checks for child partition splits. If children exist, it randomly selects one child node (recursively traversing down the hierarchy until a leaf node is found) and returns it to the map task. If no children exist, the driver initiates the revive process.

Does this PR introduce any user-facing change?

No

How was this patch tested?

ut

github-actions · 2025-08-12T08:38:15Z

This PR is stale because it has been open 20 days with no activity. Remove stale label or comment or this will be closed in 10 days.

github-actions · 2025-09-11T08:35:25Z

This PR is stale because it has been open 20 days with no activity. Remove stale label or comment or this will be closed in 10 days.

github-actions · 2025-10-05T08:33:23Z

This PR is stale because it has been open 20 days with no activity. Remove stale label or comment or this will be closed in 10 days.

github-actions · 2025-10-15T08:37:56Z

This issue was closed because it has been staled for 10 days with no activity.

[CELEBORN-2062] Introduce splitNum to optimize shuffle write

dfc3642

github-actions bot added module:client kind:documentation module:common module:tests module:worker labels Jul 11, 2025

xinyuwang1 added 3 commits July 11, 2025 17:25

license

2ec9263

fix ut

2a83fe5

format

4e02ea2

SteNicholas force-pushed the main branch from e8622e1 to d09b424 Compare July 21, 2025 11:38

fix ut

2b99551

github-actions bot added the stale label Aug 12, 2025

xinyuwang1 added 2 commits August 20, 2025 17:12

check max write parallelism

466fc03

fix doc

5db7f9d

github-actions bot removed the stale label Aug 21, 2025

github-actions bot added the stale label Sep 11, 2025

cfmcgrady removed the stale label Sep 15, 2025

github-actions bot added the stale label Oct 5, 2025

github-actions bot closed this Oct 15, 2025

turboFei reopened this Dec 30, 2025

cxzl25 removed the stale label Dec 30, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[CELEBORN-2062] Introduce splitNum to optimize shuffle write #3363

[CELEBORN-2062] Introduce splitNum to optimize shuffle write #3363

Uh oh!

littlexyw commented Jul 11, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Aug 12, 2025

Uh oh!

github-actions bot commented Sep 11, 2025

Uh oh!

github-actions bot commented Oct 5, 2025

Uh oh!

github-actions bot commented Oct 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[CELEBORN-2062] Introduce splitNum to optimize shuffle write #3363

Are you sure you want to change the base?

[CELEBORN-2062] Introduce splitNum to optimize shuffle write #3363

Uh oh!

Conversation

littlexyw commented Jul 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

github-actions bot commented Aug 12, 2025

Uh oh!

github-actions bot commented Sep 11, 2025

Uh oh!

github-actions bot commented Oct 5, 2025

Uh oh!

github-actions bot commented Oct 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

littlexyw commented Jul 11, 2025 •

edited

Loading