Skip to content

Conversation

@colinmarc
Copy link
Contributor

DataFusion sometimes passes dates as string literals, but can also pass timestamp ScalarValues, which need to be converted to predicates correctly in order to enable partition pruning.

@colinmarc colinmarc force-pushed the df-timestamps branch 3 times, most recently from 4c291c7 to 0a16745 Compare July 31, 2025 13:40
@colinmarc
Copy link
Contributor Author

The test failure seems unrelated.

@colinmarc colinmarc force-pushed the df-timestamps branch 5 times, most recently from 7629743 to ed85a34 Compare August 1, 2025 21:27
@kevinjqliu kevinjqliu requested a review from Copilot September 10, 2025 15:48
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR enhances DataFusion integration by adding support for timestamp ScalarValues in predicate conversion, enabling proper partition pruning for timestamp predicates. Previously, only string literals were handled correctly for date/time filtering.

Key changes:

  • Added timestamp scalar value handling for Second, Millisecond, Microsecond, and Nanosecond precisions
  • Enhanced type conversion logic in Iceberg's value system for better timestamp interoperability
  • Comprehensive test coverage for various timestamp formats and timezone scenarios

Reviewed Changes

Copilot reviewed 3 out of 4 changed files in this pull request and generated 2 comments.

File Description
crates/integrations/datafusion/src/physical_plan/expr_to_predicate.rs Adds timestamp ScalarValue conversion logic and comprehensive tests for timestamp predicate handling
crates/integrations/datafusion/Cargo.toml Adds chrono dependency for timestamp parsing functionality
crates/iceberg/src/spec/values.rs Enhances Datum type conversion with comprehensive timestamp format support and cross-conversion capabilities

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Copy link
Contributor

@emkornfield emkornfield left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still ramping up on the code base but left some comments.

fn test_predicate_conversion_with_timestamp() {
// 2023-01-01 12:00:00 UTC
let timestamp_scalar = ScalarValue::TimestampSecond(Some(1672574400), None);
let dt = DateTime::parse_from_rfc3339("2023-01-01T12:00:00+00:00").unwrap();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment about maybe considering parameterization. of these tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I gave it a shot, but these tests are a lot more complicated than the tests in values.rs so it ended up being pretty ugly and I backed it out.

DataFusion sometimes passes dates as string literals, but can also pass
timestamp ScalarValues, which need to be converted to predicates
correctly in order to enable partition pruning.
@colinmarc
Copy link
Contributor Author

I think I addressed all comments. Thanks for the review, and sorry about the delay!

This helps with predicate expressions such as `date > DATE_TRUNC('day',
ts)`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants