Skip to content

Conversation

@littlexyw
Copy link

@littlexyw littlexyw commented Jul 11, 2025

What changes were proposed in this pull request?

Introduce splitNum to optimize shuffle write.

Why are the changes needed?

Currently, map tasks can only write data to one partition split at a time, which will cause a significant performance regression in the case of partition skew. Therefore, the splitNum configuration is introduced to support the generation of multiple partition splits while revive and assign these partition splits to different map tasks to write, so as to improve the writing concurrency.For example, if splitNum is 2 and num mappers is 8, then it turns to
image
These leaf nodes represent the latestPartitionLocations. Only leaf nodes can be written.

When the splitStart of a partition split equals splitEnd, the splitNum becomes 1. Additionally, a maxWriteParallelism is introduced to control the maximum concurrency. By setting the initial partition split's splitEnd to maxWriteParallelism - 1, the maximum number of leaf nodes can be controlled to be maxWriteParallelism. If maxWriteParallelism is not configured, the actual maximum number of leaf nodes will equal the number of mappers.

When a map task requests to revive a PartitionLocation, the driver first checks if this partition split is already undergoing revival. If true, it records the request. If not, the driver checks for child partition splits. If children exist, it randomly selects one child node (recursively traversing down the hierarchy until a leaf node is found) and returns it to the map task. If no children exist, the driver initiates the revive process.

Does this PR introduce any user-facing change?

No

How was this patch tested?

ut

@github-actions
Copy link

This PR is stale because it has been open 20 days with no activity. Remove stale label or comment or this will be closed in 10 days.

@github-actions github-actions bot added the stale label Aug 12, 2025
@github-actions github-actions bot removed the stale label Aug 21, 2025
@github-actions
Copy link

This PR is stale because it has been open 20 days with no activity. Remove stale label or comment or this will be closed in 10 days.

@github-actions github-actions bot added the stale label Sep 11, 2025
@cfmcgrady cfmcgrady removed the stale label Sep 15, 2025
@github-actions
Copy link

github-actions bot commented Oct 5, 2025

This PR is stale because it has been open 20 days with no activity. Remove stale label or comment or this will be closed in 10 days.

@github-actions github-actions bot added the stale label Oct 5, 2025
@github-actions
Copy link

This issue was closed because it has been staled for 10 days with no activity.

@github-actions github-actions bot closed this Oct 15, 2025
@turboFei turboFei reopened this Dec 30, 2025
@cxzl25 cxzl25 removed the stale label Dec 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants