Skip to content

Conversation

PingLiuPing
Copy link
Contributor

@PingLiuPing PingLiuPing commented Oct 6, 2025

Description

Previously, PartitionData.getValue() always used decimalValue().setScale()
when deserializing DECIMAL partition values from JSON, which could fail
or produce incorrect results when the JSON contains integer representations
(long, int, or BigInteger) of decimal values.

This change adds type-specific handling for decimal deserialization:

  • Use BigDecimal.valueOf(long, scale) when JSON contains a long value
  • Use BigDecimal.valueOf(int, scale) when JSON contains an int value
  • Use BigDecimal(BigInteger, scale) when JSON contains a BigInteger value
  • Fall back to decimalValue().setScale() for other numeric types

Motivation and Context

Impact

Test Plan

Added comprehensive unit tests covering all code paths including:

  • Decimal from long values
  • Decimal from int values
  • Decimal from BigInteger values
  • Decimal from decimal values (fallback)
  • Edge cases (zero scale, large scale, negative values, very large numbers)
  • JSON round-trip tests with mixed types

Contributor checklist

  • Please make sure your submission complies with our contributing guide, in particular code style and commit standards.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== RELEASE NOTES ==

Iceberg Connector Changes
* Add more type conversion for decimal partition value.

@PingLiuPing PingLiuPing self-assigned this Oct 6, 2025
@PingLiuPing PingLiuPing requested review from a team, ZacBlanco and hantangwangd as code owners October 6, 2025 12:47
@prestodb-ci prestodb-ci added the from:IBM PR from IBM label Oct 6, 2025
@prestodb-ci prestodb-ci requested review from a team, Mariamalmesfer and namya28 and removed request for a team October 6, 2025 12:48
Copy link
Contributor

sourcery-ai bot commented Oct 6, 2025

Reviewer's Guide

Enhanced DECIMAL partition value deserialization in PartitionData.getValue with type-specific handling and added comprehensive unit tests covering all numeric input scenarios and JSON round-trip cases.

Flow diagram for DECIMAL partition value deserialization logic

flowchart TD
    A["partitionValue.isLong()"] -->|true| B["BigDecimal.valueOf(partitionValue.asLong(), scale)"]
    A -->|false| C["partitionValue.isInt()"]
    C -->|true| D["BigDecimal.valueOf(partitionValue.asInt(), scale)"]
    C -->|false| E["partitionValue.isBigInteger()"]
    E -->|true| F["new BigDecimal(partitionValue.bigIntegerValue(), scale)"]
    E -->|false| G["partitionValue.decimalValue().setScale(scale)"]
Loading

File-Level Changes

Change Details Files
Enhanced DECIMAL partition value deserialization with type-specific handling
  • Handle JSON long nodes via BigDecimal.valueOf(long, scale)
  • Handle JSON int nodes via BigDecimal.valueOf(int, scale)
  • Handle JSON BigInteger nodes via new BigDecimal(BigInteger, scale)
  • Fallback to decimalValue().setScale() for other numeric types
presto-iceberg/src/main/java/com/facebook/presto/iceberg/PartitionData.java
Added comprehensive unit tests for decimal partition value conversion and JSON round-trip
  • Test null input handling across types
  • Test decimal from long, int, BigInteger and fallback paths
  • Validate edge cases: zero scale, large scale, negative and very large numbers
  • JSON round-trip tests for decimals and mixed-type partitions
presto-iceberg/src/test/java/com/facebook/presto/iceberg/TestPartitionData.java

Possibly linked issues

  • #Parquet Decimal Metadata Issue: The PR resolves the issue by correctly handling decimal precision and scale during deserialization of partition values.

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes - here's some feedback:

  • In the DECIMAL branch, extract the DecimalType cast and its scale into local variables up front to avoid repeated casts and improve readability.
  • The fallback uses setScale without a RoundingMode, which will throw if the input has more fractional digits than the target scale—consider specifying an explicit RoundingMode or validating precision to make deserialization behavior clearer.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In the DECIMAL branch, extract the DecimalType cast and its scale into local variables up front to avoid repeated casts and improve readability.
- The fallback uses setScale without a RoundingMode, which will throw if the input has more fractional digits than the target scale—consider specifying an explicit RoundingMode or validating precision to make deserialization behavior clearer.

## Individual Comments

### Comment 1
<location> `presto-iceberg/src/main/java/com/facebook/presto/iceberg/PartitionData.java:197-187` </location>
<code_context>
+                else if (partitionValue.isBigInteger()) {
+                    return new BigDecimal(partitionValue.bigIntegerValue(), ((DecimalType) type).scale());
+                }
+                else {
+                    return partitionValue.decimalValue().setScale(((DecimalType) type).scale());
+                }
         }
</code_context>

<issue_to_address>
**suggestion (bug_risk):** Consider specifying rounding mode when setting scale on BigDecimal.

Omitting the rounding mode can cause ArithmeticException if the value cannot be exactly represented at the desired scale. Specify a rounding mode to prevent this.

Suggested implementation:

```java
                    return partitionValue.decimalValue().setScale(((DecimalType) type).scale(), java.math.RoundingMode.HALF_UP);

```

If `RoundingMode` is not already imported at the top of the file, add:
```java
import java.math.RoundingMode;
```
You may want to choose a different rounding mode depending on your requirements.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

catch (IOException e) {
throw new UncheckedIOException("Failed during JSON conversion of " + partitionValue, e);
}
case DECIMAL:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (bug_risk): Consider specifying rounding mode when setting scale on BigDecimal.

Omitting the rounding mode can cause ArithmeticException if the value cannot be exactly represented at the desired scale. Specify a rounding mode to prevent this.

Suggested implementation:

                    return partitionValue.decimalValue().setScale(((DecimalType) type).scale(), java.math.RoundingMode.HALF_UP);

If RoundingMode is not already imported at the top of the file, add:

import java.math.RoundingMode;

You may want to choose a different rounding mode depending on your requirements.

@PingLiuPing PingLiuPing force-pushed the lp_fix_partition_value branch from 34e41e3 to 8b634d9 Compare October 10, 2025 14:38
@PingLiuPing
Copy link
Contributor Author

PingLiuPing commented Oct 10, 2025

@hantangwangd Not sure if you are interested in reviewing this PR, this PR is motivated when reviewing PR #26174 (comment) and I was suggested to move this part and submit a new PR.

@PingLiuPing
Copy link
Contributor Author

@aditi-pandit Can you please take a look. This PR is per your comments in #26174 (comment)

Copy link
Member

@hantangwangd hantangwangd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@PingLiuPing have you ever encountered this problem during real usage? If so, could you also add a similar test case which would fail without this change in IcebergDistributedTestBase or IcebergDistributedSmokeTestBase? Thanks.

@PingLiuPing
Copy link
Contributor Author

@hantangwangd Thanks for the comment.
Yes, these code are needed for Prestissimo iceberg scenario when writing partitioned iceberg table.
When passing partition value (part of the commit message) from velox to Presto coordinator, depends on the actual decimal value we have to cover all the possible conversions.
The logic here is, in Velox decimal is represented as integer (int64/int128). But when we passing the partition value back to Presto, it is encoded in json format, and here when decode the value, based on value it will be treated as different integer types.

@PingLiuPing
Copy link
Contributor Author

PingLiuPing commented Oct 10, 2025

@hantangwangd Just want to split this piece of code separately. The purpose of this PR is serving the native iceberg write. I have test cases to cover this from E2E point of view. And I will deliver those testcases once the partition transform related code is merged.
#26182 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

from:IBM PR from IBM

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants