-
Notifications
You must be signed in to change notification settings - Fork 40
Add support for data manipulation operations in Blob Storage adapter #3124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces data manipulation language (DML) support for the Blob Storage adapter, which is a significant and well-implemented feature. The changes include the core logic for handling mutations, numerous integration tests to ensure correctness, and necessary updates to error codes and utility classes. My review focuses on improving code clarity, robustness, and performance in the new classes. I've identified opportunities to remove dead code, enhance immutability, improve test coverage for conditional operations, and optimize a hashCode implementation. Overall, this is a solid contribution that greatly enhances the capabilities of the object storage adapter.
core/src/main/java/com/scalar/db/storage/objectstorage/ObjectStorageRecord.java
Show resolved
Hide resolved
core/src/main/java/com/scalar/db/storage/objectstorage/ObjectStorageRecord.java
Show resolved
Hide resolved
core/src/main/java/com/scalar/db/storage/objectstorage/ObjectStorageRecord.java
Show resolved
Hide resolved
...ava/com/scalar/db/storage/objectstorage/ObjectStorageConditionalMutationIntegrationTest.java
Outdated
Show resolved
Hide resolved
core/src/main/java/com/scalar/db/storage/objectstorage/ObjectStorageMutation.java
Outdated
Show resolved
Hide resolved
core/src/main/java/com/scalar/db/storage/objectstorage/ObjectStorageMutation.java
Show resolved
Hide resolved
core/src/main/java/com/scalar/db/storage/objectstorage/PartitionIdentifier.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request adds significant data manipulation capabilities to the Blob Storage adapter, a major step forward. The overall implementation is solid, with good use of handlers for different statement types and a clear read-modify-write pattern for mutations. I've identified a few areas for improvement, including a bug in data serialization, a suboptimal hash implementation, and opportunities to enhance test coverage and code readability. My detailed comments are below.
...ava/com/scalar/db/storage/objectstorage/ObjectStorageConditionalMutationIntegrationTest.java
Outdated
Show resolved
Hide resolved
core/src/main/java/com/scalar/db/storage/objectstorage/ObjectStorage.java
Outdated
Show resolved
Hide resolved
core/src/main/java/com/scalar/db/storage/objectstorage/PartitionIdentifier.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces Data Manipulation Language (DML) support for the Azure Blob Storage adapter, which is a significant enhancement. The changes include new classes for handling mutations and selections, as well as comprehensive integration and unit tests. The implementation correctly identifies and enforces several limitations inherent to object storage, such as the lack of index support, restricted conditional operations for BLOB types, and limitations on cross-partition scan ordering. The addition of specific error codes for these scenarios is also a positive step for clarity and debugging. Overall, the changes are well-structured and provide a solid foundation for DML operations in Object Storage.
core/src/main/java/com/scalar/db/storage/objectstorage/SelectStatementHandler.java
Show resolved
Hide resolved
core/src/main/java/com/scalar/db/storage/objectstorage/ObjectStorageRecord.java
Outdated
Show resolved
Hide resolved
core/src/main/java/com/scalar/db/storage/objectstorage/ResultInterpreter.java
Show resolved
Hide resolved
core/src/integration-test/java/com/scalar/db/storage/objectstorage/ObjectStorageTestUtils.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot reviewed 56 out of 56 changed files in this pull request and generated no comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // Assert | ||
| assertThat(keys).isEmpty(); | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since Blob Storage can return up to 5,000 keys with a LIST operation, we should check whether getKeys can return more than 5,000 keys.
Ref: https://learn.microsoft.com/rest/api/storageservices/list-blobs
70f2d31 to
0a73daf
Compare
|
|
||
| Column<?> column1 = | ||
| ColumnValueMapper.convert( | ||
| clusteringKey1.get(columnName), columnName, metadata.getColumnDataType(columnName)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[minor] metadata.getColumnDataType(columnName) can be reused.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in 533f3cd.
| Map<String, ObjectStorageRecord> partition = | ||
| getPartition(namespaceName, tableName, partitionKey, readVersionMap); | ||
| for (Mutation mutation : mutations) { | ||
| if (mutation instanceof Put) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Insert, Update and Upsert aren't passed to this method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DistributedStorage only handles Put. Insert, Update, and Upsert are translated into Put with a condition.
| if (validationFailed) { | ||
| throw new ExecutionException( | ||
| String.format( | ||
| "A condition failed. ConditionalExpression: %s, Column: %s", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about including the operator in this error message?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, it's a mistake. We should pass the expression that includes both the column and operator information instead of expectedColumn. Let me fix it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in 533f3cd.
| throw new IllegalArgumentException( | ||
| CoreError.OPERATION_CHECK_ERROR_ORDERING_NOT_PROPERLY_SPECIFIED.buildMessage(scan)); | ||
| } | ||
| boolean rightOrder = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you elaborate why this variable name is rightOrder?
Also, if the first scan order is the same as the clustering key order and the second scan order is different, it will fail?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, maybe isValidOrder is more suitable. The variable indicates whether the specified scan ordering is valid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In ScalarDB, we can only read records sorted by the clustering order defined in the table metadata, in forward or reverse order. For example, if a table has clustering keys first (Order=ASC) and second (Order=DESC), we can specify the following scan order:
| first | second |
|---|---|
| ASC | - |
| ASC | DESC |
| DESC | - |
| DESC | ASC |
@brfrn169 I think we discussed this previously, but is this still correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, if the first scan order is the same as the clustering key order and the second scan order is different, it will fail?
So, the answer is "yes."
|
|
||
| records.sort( | ||
| (o1, o2) -> | ||
| new ClusteringKeyComparator(metadata) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might be missing something, but why don't we simply sort the records based on Scan.ordering (and the metadata's clustering key orders) at this point? If it works, the following Collections.reverse() isn't needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Due to this specification, we need to sort the records by the clustering order defined in the table metadata. Then, we can decide whether to read the sorted records in forward or reverse order based on the order specified in the scan request.
Description
This PR adds DML support for the Blob adapter, enabling data manipulation operations on Azure Blob Storage.
Support for S3 and GCS will be addressed in subsequent PRs.
Related issues and/or PRs
Changes made
DistributedStorageimplementation for Object Storage.Checklist
Additional notes (optional)
N/A
Release notes
Added support for data manipulation operations over Azure Blob Storage.