-
Notifications
You must be signed in to change notification settings - Fork 5.5k
refactor(protocol): Add new generic thrift toolkit module for connectors #26259
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Reviewer's GuideThis PR extracts and centralizes Thrift serialization logic into a new reusable connector toolkit module, extends SPI and server components to support ColumnHandle codecs, and updates the TPCDS connector and project poms to integrate the new toolkit. Class diagram for new ThriftCodecProvider and GenericThriftCodecclassDiagram
class ThriftCodecProvider {
-ThriftCodecManager thriftCodecManager
-Optional<Type> connectorSplitType
-Optional<Type> connectorTransactionHandle
-Optional<Type> connectorTableLayoutHandle
-Optional<Type> connectorTableHandle
-Optional<Type> connectorOutputTableHandle
-Optional<Type> connectorInsertTableHandle
-Optional<Type> connectorDeleteTableHandle
-Optional<Type> connectorColumnHandle
+getConnectorSplitCodec()
+getConnectorTransactionHandleCodec()
+getConnectorTableLayoutHandleCodec()
+getConnectorTableHandleCodec()
+getConnectorOutputTableHandleCodec()
+getConnectorInsertTableHandleCodec()
+getConnectorDeleteTableHandleCodec()
+getColumnHandleCodec()
+getThriftCodecManager()
}
class ThriftCodecProvider.Builder {
-ThriftCodecManager thriftCodecManager
-Optional<Type> connectorSplitType
-Optional<Type> connectorTransactionHandle
-Optional<Type> connectorTableLayoutHandle
-Optional<Type> connectorTableHandle
-Optional<Type> connectorOutputTableHandle
-Optional<Type> connectorInsertTableHandle
-Optional<Type> connectorDeleteTableHandle
-Optional<Type> connectorColumnHandle
+setThriftCodecManager()
+setConnectorSplitType()
+setConnectorTransactionHandle()
+setConnectorTableLayoutHandle()
+setConnectorTableHandle()
+setConnectorOutputTableHandle()
+setConnectorInsertTableHandle()
+setConnectorDeleteTableHandle()
+setConnectorColumnHandle()
+build()
}
ThriftCodecProvider.Builder --> ThriftCodecProvider
ThriftCodecProvider --> GenericThriftCodec
class GenericThriftCodec<T> {
-ThriftCodec<T> thriftCodec
+serialize(T value)
+deserialize(byte[] bytes)
}
GenericThriftCodec <|.. ConnectorCodec
ThriftCodecProvider ..> ThriftCodecManager
GenericThriftCodec ..> ThriftCodecManager
Class diagram for updated ConnectorCodecManager and ConnectorCodecProviderclassDiagram
class ConnectorCodecManager {
-Map<String, ConnectorCodecProvider> connectorCodecProviders
+getTableHandleCodec(String connectorId)
+getColumnHandleCodec(String connectorId)
}
class ConnectorCodecProvider {
+getConnectorTableHandleCodec()
+getColumnHandleCodec()
}
ConnectorCodecManager --> ConnectorCodecProvider
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
d88dfc6
to
e84d692
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey there - I've reviewed your changes - here's some feedback:
- Replace the
assert (byteBuffer.position() == 0)
in ColumnHandleThriftCodec with explicit validation and error handling, since assertions may be disabled in production. - Consider tightening GenericThriftCodec’s constructor to accept Class directly instead of a generic Type to eliminate the runtime cast and provide a clearer API.
- ThriftCodecProvider.Builder exposes one setter per handle type; if more codecs are needed in the future consider consolidating into a registration map or varargs-based API to reduce boilerplate.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- Replace the `assert (byteBuffer.position() == 0)` in ColumnHandleThriftCodec with explicit validation and error handling, since assertions may be disabled in production.
- Consider tightening GenericThriftCodec’s constructor to accept Class<T> directly instead of a generic Type to eliminate the runtime cast and provide a clearer API.
- ThriftCodecProvider.Builder exposes one setter per handle type; if more codecs are needed in the future consider consolidating into a registration map or varargs-based API to reduce boilerplate.
## Individual Comments
### Comment 1
<location> `presto-main-base/src/main/java/com/facebook/presto/server/thrift/ColumnHandleThriftCodec.java:63-65` </location>
<code_context>
+ public ColumnHandle readConcreteValue(String connectorId, TProtocolReader reader)
+ throws Exception
+ {
+ ByteBuffer byteBuffer = reader.readBinary();
+ assert (byteBuffer.position() == 0);
+ byte[] bytes = byteBuffer.array();
+ return connectorCodecManager.getColumnHandleCodec(connectorId).map(codec -> codec.deserialize(bytes)).orElse(null);
+ }
</code_context>
<issue_to_address>
**issue (bug_risk):** Potential risk with ByteBuffer.array() usage.
Directly accessing byteBuffer.array() may cause issues if the buffer is not array-backed or contains extra data. To ensure correct deserialization, use byteBuffer.get() with byteBuffer.remaining() to extract only the relevant bytes.
</issue_to_address>
### Comment 2
<location> `presto-main-base/src/main/java/com/facebook/presto/server/thrift/ColumnHandleThriftCodec.java:74` </location>
<code_context>
+ throws Exception
+ {
+ requireNonNull(value, "value is null");
+ writer.writeBinary(ByteBuffer.wrap(connectorCodecManager.getColumnHandleCodec(connectorId).map(codec -> codec.serialize(value)).orElseThrow(() -> new IllegalArgumentException("Can not serialize " + value))));
+ }
+
</code_context>
<issue_to_address>
**suggestion:** Exception message could be more informative.
Consider including connectorId and value type in the exception to improve debugging.
```suggestion
writer.writeBinary(ByteBuffer.wrap(
connectorCodecManager.getColumnHandleCodec(connectorId)
.map(codec -> codec.serialize(value))
.orElseThrow(() -> new IllegalArgumentException(
String.format(
"Can not serialize value '%s' of type '%s' for connectorId '%s'",
value,
value.getClass().getName(),
connectorId
)
))
));
```
</issue_to_address>
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
ByteBuffer byteBuffer = reader.readBinary(); | ||
assert (byteBuffer.position() == 0); | ||
byte[] bytes = byteBuffer.array(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (bug_risk): Potential risk with ByteBuffer.array() usage.
Directly accessing byteBuffer.array() may cause issues if the buffer is not array-backed or contains extra data. To ensure correct deserialization, use byteBuffer.get() with byteBuffer.remaining() to extract only the relevant bytes.
91a16e8
to
47af2d5
Compare
47af2d5
to
40d9829
Compare
Description
The thrift implementation for the tpcds connector can be generalized so that it can be reused.
Motivation and Context
We can use this new interface in other connectors to add support for thrift serialization
Impact
None
Test Plan
The current thrift tpcds UTs
Contributor checklist
Release Notes