-
Notifications
You must be signed in to change notification settings - Fork 1.4k
[Real-time Table Replication X clusters][1/n] Creating new tables with designated consuming segments #17235
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
83ba2a1 to
75bcea7
Compare
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #17235 +/- ##
============================================
- Coverage 63.25% 63.24% -0.01%
- Complexity 1475 1476 +1
============================================
Files 3162 3167 +5
Lines 188668 188992 +324
Branches 28869 28916 +47
============================================
+ Hits 119348 119537 +189
- Misses 60074 60180 +106
- Partials 9246 9275 +29
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
pinot-controller/src/main/java/org/apache/pinot/controller/api/resources/CopyTablePayload.java
Show resolved
Hide resolved
pinot-controller/src/main/java/org/apache/pinot/controller/api/resources/CopyTablePayload.java
Outdated
Show resolved
Hide resolved
...ller/src/main/java/org/apache/pinot/controller/api/resources/PinotRealtimeTableResource.java
Outdated
Show resolved
Hide resolved
77c1c11 to
c831445
Compare
0c4ad75 to
68a373b
Compare
pinot-controller/src/test/resources/table/table_config_with_instance_assignment.json
Outdated
Show resolved
Hide resolved
1893728 to
33e7d47
Compare
|
@Jackie-Jiang @chenboat Could you take a review? The manual integration test gets done already. The purpose of this PR is to copy the schema and table config from the source cluster to the target cluster by replacing the server / broke tenant. And then copy the consuming segment to kick off the message consumption on the new table. The backfill of segments prior to consuming segments will be implemented in a follow up PR |
371998c to
6dfbe51
Compare
pinot-controller/src/main/java/org/apache/pinot/controller/api/resources/CopyTablePayload.java
Show resolved
Hide resolved
...ller/src/main/java/org/apache/pinot/controller/api/resources/PinotRealtimeTableResource.java
Show resolved
Hide resolved
...oller/src/main/java/org/apache/pinot/controller/api/resources/PinotTableRestletResource.java
Outdated
Show resolved
Hide resolved
...oller/src/main/java/org/apache/pinot/controller/api/resources/PinotTableRestletResource.java
Show resolved
Hide resolved
...oller/src/main/java/org/apache/pinot/controller/api/resources/PinotTableRestletResource.java
Outdated
Show resolved
Hide resolved
...oller/src/main/java/org/apache/pinot/controller/api/resources/PinotTableRestletResource.java
Outdated
Show resolved
Hide resolved
...oller/src/main/java/org/apache/pinot/controller/api/resources/PinotTableRestletResource.java
Show resolved
Hide resolved
...ntroller/src/main/java/org/apache/pinot/controller/helix/core/PinotHelixResourceManager.java
Show resolved
Hide resolved
...ntroller/src/main/java/org/apache/pinot/controller/helix/core/PinotHelixResourceManager.java
Outdated
Show resolved
Hide resolved
758c88b to
c27583c
Compare
|
The unit test Set-1 fail in java 11 but succeed in java 21. The error is about "selectionCombineOperator Not found issue", which is nothing related to table copy. I just do a rebase without any conflict resolving. Previous tests were all successful. |
|
Walk through the design doc with @abhishekbafna on 12/23 and aligned on the direction of the solution. |
...ntroller/src/main/java/org/apache/pinot/controller/helix/core/PinotHelixResourceManager.java
Outdated
Show resolved
Hide resolved
...ontroller/src/main/java/org/apache/pinot/controller/helix/core/WatermarkInductionResult.java
Outdated
Show resolved
Hide resolved
...ontroller/src/main/java/org/apache/pinot/controller/helix/core/WatermarkInductionResult.java
Outdated
Show resolved
Hide resolved
ccb638a to
4cf3fae
Compare
...oller/src/main/java/org/apache/pinot/controller/api/resources/PinotTableRestletResource.java
Outdated
Show resolved
Hide resolved
...oller/src/main/java/org/apache/pinot/controller/api/resources/PinotTableRestletResource.java
Outdated
Show resolved
Hide resolved
...oller/src/main/java/org/apache/pinot/controller/api/resources/PinotTableRestletResource.java
Outdated
Show resolved
Hide resolved
...oller/src/main/java/org/apache/pinot/controller/api/resources/PinotTableRestletResource.java
Show resolved
Hide resolved
...oller/src/main/java/org/apache/pinot/controller/api/resources/PinotTableRestletResource.java
Outdated
Show resolved
Hide resolved
...oller/src/main/java/org/apache/pinot/controller/api/resources/PinotTableRestletResource.java
Show resolved
Hide resolved
...oller/src/main/java/org/apache/pinot/controller/api/resources/PinotTableRestletResource.java
Show resolved
Hide resolved
...ntroller/src/main/java/org/apache/pinot/controller/helix/core/PinotHelixResourceManager.java
Show resolved
Hide resolved
...ntroller/src/main/java/org/apache/pinot/controller/helix/core/PinotHelixResourceManager.java
Show resolved
Hide resolved
...ontroller/src/main/java/org/apache/pinot/controller/helix/core/WatermarkInductionResult.java
Show resolved
Hide resolved
|
@abhishekbafna I add the dryRun as a boolean flag in the payload. The reply contains schema, modified table config and watermarkResult as well. The write operation (schema and table config addition) is moved to the end. |
|
MSE against 1.3 failed but succeed against 1.4 and master |
pinot-controller/src/main/java/org/apache/pinot/controller/api/resources/CopyTablePayload.java
Outdated
Show resolved
Hide resolved
...oller/src/main/java/org/apache/pinot/controller/api/resources/PinotTableRestletResource.java
Outdated
Show resolved
Hide resolved
...oller/src/main/java/org/apache/pinot/controller/api/resources/PinotTableRestletResource.java
Outdated
Show resolved
Hide resolved
...oller/src/main/java/org/apache/pinot/controller/api/resources/PinotTableRestletResource.java
Outdated
Show resolved
Hide resolved
...oller/src/main/java/org/apache/pinot/controller/api/resources/PinotTableRestletResource.java
Outdated
Show resolved
Hide resolved
...oller/src/main/java/org/apache/pinot/controller/api/resources/PinotTableRestletResource.java
Outdated
Show resolved
Hide resolved
...ain/java/org/apache/pinot/controller/helix/core/realtime/PinotLLCRealtimeSegmentManager.java
Outdated
Show resolved
Hide resolved
...ain/java/org/apache/pinot/controller/helix/core/realtime/PinotLLCRealtimeSegmentManager.java
Outdated
Show resolved
Hide resolved
...ntroller/src/main/java/org/apache/pinot/controller/helix/core/PinotHelixResourceManager.java
Show resolved
Hide resolved
abhishekbafna
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
| @Produces(MediaType.APPLICATION_JSON) | ||
| @Path("/tables/{tableName}/consumerWatermarks") | ||
| @Authorize(targetType = TargetType.TABLE, paramName = "tableName", action = Actions.Table.GET_IDEAL_STATE) | ||
| @ApiOperation(value = "Get table ideal state", notes = "Get table ideal state") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is it about "table ideal state"? Is it typo?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch.
| } | ||
|
|
||
| /** | ||
| * Performs validations of table config and adds the table to zookeeper |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any restriction on this API? Realtime table only? Can it add upsert tables? If so, please clarify in the javadoc.
| * The {@code PartitionGroupInfo} class represents the metadata and sequence number for a partition group. | ||
| * It encapsulates the {@link PartitionGroupMetadata} and the sequence number associated with it. | ||
| */ | ||
| public class PartitionGroupInfo { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need any wrapper class like this? Can we reuse PartitionGroupMetadata or add to it instead of creating a new class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From a pure Kafka consumer point of view, the segment sequence number doesn't seem to belong to a PartitionGroupMeta? I
I am also concerned that the change of the constructor and adding the sequence number might cause review rejections. Many places already use the class PartitonGroupMeta. Adding a wrapper class would bring benefit of less existing code changes.
- Updated PinotRealtimeTableResource.java: Changed misleading JavaDoc from "Get table ideal state" to accurately describe the consumer watermarks endpoint - Updated PinotHelixResourceManager.java: Added explicit documentation clarifying that the API is restricted to realtime tables only and works with both upsert and non-upsert tables
- Updated PinotHelixResourceManager.addTable() to accept List<Pair<PartitionGroupMetadata, Integer>> - Updated PinotLLCRealtimeSegmentManager.setUpNewTable() to use Pair instead of PartitionGroupInfo - Refactored setupNewPartitionGroup() to accept metadata and sequence separately - Updated PinotTableRestletResource copyTable endpoint to create Pair instances - Updated unit tests in PinotHelixResourceManagerStatelessTest and PartitionGroupInfoTest - Added necessary imports for PartitionGroupMetadata and Pair
- Deleted PartitionGroupInfo.java as it has been replaced with Pair<PartitionGroupMetadata, Integer> - Deleted PartitionGroupInfoTest.java as the class is no longer needed - Verified no other classes reference PartitionGroupInfo
9f6c222 to
6aa1cc0
Compare
This the first part of a series of changes aimed at enabling real-time table replication between two Pinot clusters.
The core purpose of this specific PR is to introduce the necessary functionality for creating a new real-time table with consuming segment watermarks inducted from source table's ZK metadata and a designated broker / server tenant when performing a table copy operation.
The key changes introduced by the 3 commits are:
IN_PROGRESS, we consider this segment is the CONSUMING segment, thus we copy the startOffset and sequence number as the start consuming position of that partition for new table.DONE, the consuming segment is the one next to it, thus we will use the endOffset and 1 + sequence_number to initialize the consuming segment for new table.Doesn't support following feature at the moment:
Test
We have a manual Test in Uber's pinot cluster. Confirm that new consuming segments of the target cluster copy the sequence number (rather than 0) and startOffset per partition when the latest segments in the source table are consuming segments. And the consuming status is Active.
Design Draft