-
Notifications
You must be signed in to change notification settings - Fork 34
feat: Add comprehensive snapshot expiration functionality for table maintenance #262
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…ented provide more context for unimplemented macro
…ented provide more context for unimplemented macro
…or-unimplemented Revert "provide more context for unimplemented macro"
use separate manifest files for delete files
fix: make version-hint.text compatible with other readers
fix clippy warnings
fix: conflict when projecting a field not present in equality deletes
fix: update `last-updated-ms` field when updating a table
fix: use correct number of splits for deletes
AppendSequenceGroups operation
docs: default compression level is 3
- Implemented snapshot expiration logic in the maintenance module. - Added methods for time-based and count-based snapshot retention. - Created examples demonstrating the usage of snapshot expiration. - Updated README with new features and examples. - Added integration tests for snapshot expiration functionality.
…erg-rust-spec APIs - Fix imports for SnapshotReference, SnapshotRetention, and SnapshotBuilder - Update create_test_snapshot to use correct builder pattern - Update create_test_ref to match current SnapshotReference structure - Remove unused imports from test file - All tests passing (9 unit tests + 6 integration tests)
- Resolved conflicts in Cargo.lock, datafusion_iceberg/Cargo.toml, manifest_list.rs, and transaction/operation.rs - Updated duckdb version to 1.4 in dev dependencies - Accepted upstream changes for files not modified by expire_snapshots feature
|
@JanKaul let me know what you think :) |
|
Wow, this is really cool stuff! Thanks a lot for your effort. I think it would be best to integrate this with the current operation framework: https://github.com/JanKaul/iceberg-rust/blob/main/iceberg-rust/src/table/transaction/operation.rs#L105. You should be able to move the code from your execute method into the Operation::execute method. And additionally use the TableUpdate::RemoveSnapshots as the Iceberg REST catalog already supports this. |
|
@JanKaul take a look and let me know if those were the changes you were looking for or not! |
|
Yes, the changes look great! Thank you It looks like the implementations in the Do you have any use cases were you would prefer to use the |
|
hi @JanKaul , thanks for taking a look! This is my first Rust PR to an OOS repo, and i know it probably looks like a big ugly mess. I took me a good amount of time to put it together, and i changed directions a few times - so i figured there would be some duplicate code somewhere that i missed! Im happy to make any changes you want (feel free to make changes as well)! To answer your question, the usecase i building to is to be able to use the maintenance feature(s) in a job running on using ray . Ideally, the next thing that I would want to add to the repo, would compaction for files, and manifest re-write. Since this would be where the power of your framework would really shine for my usecase. Does that make sense? Let me know if you want to chat on discord, or google meet or otherwise! |
|
@JanKaul pushed some updates, let me know what you think. |
|
Sorry, I didn't have time to look into it. I will do in the coming days |
Summary
This PR adds comprehensive(ish) (i think) snapshot expiration functionality to iceberg-rust, enabling automatic cleanup of old snapshots and associated files according to configurable retention policies.
Features Added
Core Functionality
ExpireSnapshotsbuilder with fluent configurationexpire_older_than)retain_last)API Example
Files Added/Modified
New Files
iceberg-rust/src/table/maintenance/- New maintenance moduleiceberg-rust/src/table/maintenance/expire_snapshots.rs- Core implementation (1,007 lines)iceberg-rust/examples/expire_snapshots.rs- Usage examplesiceberg-rust/tests/snapshot_expiration.rs- Comprehensive integration testsModified Files
iceberg-rust/src/table/mod.rs- Addedexpire_snapshots()method to Tableiceberg-rust/src/lib.rs- Exposed maintenance moduleTesting
Comprehensive Test Coverage
Test Categories
Implementation Details
Documentation
readme.mdwith the new features.Compatibility
This implementation provides a solid foundation for table maintenance operations in iceberg-rust, following Icebergs specification, and similar to that of the implementation from pyiceberg (that I also worked on 😮💨 ) in Rust!