-
Notifications
You must be signed in to change notification settings - Fork 344
fix(iceberg-datafusion): handle timestamp predicates from DF #1569
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
4c291c7 to
0a16745
Compare
|
The test failure seems unrelated. |
7629743 to
ed85a34
Compare
ed85a34 to
2a1f575
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR enhances DataFusion integration by adding support for timestamp ScalarValues in predicate conversion, enabling proper partition pruning for timestamp predicates. Previously, only string literals were handled correctly for date/time filtering.
Key changes:
- Added timestamp scalar value handling for Second, Millisecond, Microsecond, and Nanosecond precisions
- Enhanced type conversion logic in Iceberg's value system for better timestamp interoperability
- Comprehensive test coverage for various timestamp formats and timezone scenarios
Reviewed Changes
Copilot reviewed 3 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| crates/integrations/datafusion/src/physical_plan/expr_to_predicate.rs | Adds timestamp ScalarValue conversion logic and comprehensive tests for timestamp predicate handling |
| crates/integrations/datafusion/Cargo.toml | Adds chrono dependency for timestamp parsing functionality |
| crates/iceberg/src/spec/values.rs | Enhances Datum type conversion with comprehensive timestamp format support and cross-conversion capabilities |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still ramping up on the code base but left some comments.
| fn test_predicate_conversion_with_timestamp() { | ||
| // 2023-01-01 12:00:00 UTC | ||
| let timestamp_scalar = ScalarValue::TimestampSecond(Some(1672574400), None); | ||
| let dt = DateTime::parse_from_rfc3339("2023-01-01T12:00:00+00:00").unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment about maybe considering parameterization. of these tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I gave it a shot, but these tests are a lot more complicated than the tests in values.rs so it ended up being pretty ugly and I backed it out.
crates/integrations/datafusion/src/physical_plan/expr_to_predicate.rs
Outdated
Show resolved
Hide resolved
DataFusion sometimes passes dates as string literals, but can also pass timestamp ScalarValues, which need to be converted to predicates correctly in order to enable partition pruning.
2a1f575 to
0a0f60c
Compare
|
I think I addressed all comments. Thanks for the review, and sorry about the delay! |
0a0f60c to
30a6db5
Compare
This helps with predicate expressions such as `date > DATE_TRUNC('day',
ts)`.
30a6db5 to
86fac9a
Compare
DataFusion sometimes passes dates as string literals, but can also pass timestamp ScalarValues, which need to be converted to predicates correctly in order to enable partition pruning.