-
Notifications
You must be signed in to change notification settings - Fork 408
[CELEBORN-2062] Introduce splitNum to optimize shuffle write #3363
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
This PR is stale because it has been open 20 days with no activity. Remove stale label or comment or this will be closed in 10 days. |
|
This PR is stale because it has been open 20 days with no activity. Remove stale label or comment or this will be closed in 10 days. |
|
This PR is stale because it has been open 20 days with no activity. Remove stale label or comment or this will be closed in 10 days. |
|
This issue was closed because it has been staled for 10 days with no activity. |
What changes were proposed in this pull request?
Introduce splitNum to optimize shuffle write.
Why are the changes needed?
Currently, map tasks can only write data to one partition split at a time, which will cause a significant performance regression in the case of partition skew. Therefore, the splitNum configuration is introduced to support the generation of multiple partition splits while revive and assign these partition splits to different map tasks to write, so as to improve the writing concurrency.For example, if splitNum is 2 and num mappers is 8, then it turns to

These leaf nodes represent the latestPartitionLocations. Only leaf nodes can be written.
When the splitStart of a partition split equals splitEnd, the splitNum becomes 1. Additionally, a maxWriteParallelism is introduced to control the maximum concurrency. By setting the initial partition split's splitEnd to maxWriteParallelism - 1, the maximum number of leaf nodes can be controlled to be maxWriteParallelism. If maxWriteParallelism is not configured, the actual maximum number of leaf nodes will equal the number of mappers.
When a map task requests to revive a PartitionLocation, the driver first checks if this partition split is already undergoing revival. If true, it records the request. If not, the driver checks for child partition splits. If children exist, it randomly selects one child node (recursively traversing down the hierarchy until a leaf node is found) and returns it to the map task. If no children exist, the driver initiates the revive process.
Does this PR introduce any user-facing change?
No
How was this patch tested?
ut