-
Notifications
You must be signed in to change notification settings - Fork 344
feat(io): Make Storage a trait
#1755
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| } | ||
|
|
||
| #[async_trait] | ||
| impl Storage for OpenDALGcsStorage { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can be pretty annoying to implement nearly the same thing for all storage services, can we avoid that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this thread is relavant: https://docs.google.com/document/d/1-CEvRvb52vPTDLnzwJRBx5KLpej7oSlTu_rg0qKEGZ8/edit?disco=AAABrRO9Prk
The benefit of doing this is that users are allowed to only implement Storage for certain schemes. The annoying part of having duplicate code for multiple schemes will mostly apply to a versatile storage implementation like OpenDAL, which already has a convenient operator layer. For custom storage, I don't expect them to implement all schemes anyway (I may be wrong on this assumption)
For code duplication, I consider OpenDAL Storage to be the "managed" default storage that lives in this repo and we will have more control over the implementation. Once we have a new crate for each storage implementation(iceberg-storage-opendal), we can add some helpers to reduce the code duplication
| /// } | ||
| /// ``` | ||
| #[async_trait] | ||
| pub trait Storage: Debug + Send + Sync { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Following our discussion in last community sync, let's create a new crate called iceberg-storage, and put newly added things into that storage. I think with this approach we don't need to change existing code path, and when we reached consensus on the api, we could replace FileIO in core crate with FileIO in iceberg-storage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good! I'll start working on this next week
I think I forgot some details from the meeting but I assume eventually Storage trait is still going to live under the core crate just like the existing Catalog trait, right? The iceberg-storage is more of a temporary crate to make review/collaboration easier
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean the Storage trait only or the whole storage implementation? I'm thinking about putting them in the iceberg-storage crate forever, sinc this align with our small crate pattern. iceberg-storage has no much dependencies, and could be used standalone. After we split out iceberg-storage, we could even more out of the core crate, like puffin format.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 on putting the Storage trait in iceberg-storage forever.
|
Hi, @liurenjie1024 and @CTTY, I have thinked about this again. I think we can split the concept around As discussed in #1797, sometimes people don't care about FileIO (the thing in java) at all. They will initiate, build and manage their own IO abstraction and only want to be used as So, I think we should make |
I'm a little confused about your point, would you mind to write a more detailed proposal? |
Will do |
Which issue does this PR close?
What changes are included in this PR?
Are these changes tested?
Added some tests for the new storage builder and registry, but mostly relying on the existing tests