Skip to content

feat: Add ScalarValue::{new_one,new_zero,new_ten,distance} support for Decimal128 and Decimal256 #16831

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Jul 27, 2025

Conversation

theirix
Copy link
Contributor

@theirix theirix commented Jul 20, 2025

Which issue does this PR close?

Rationale for this change

Enhancing support for ScalarValue::Decimal128 and ScalarValue::Decimal256

What changes are included in this PR?

  • Add support for Decimal128 and Decimal256 to ScalarValue in functions: new_one, new_zero, new_ten, distance
  • Support simplifcation optimiser rule for Decimal256: is_zero and is_one

Are these changes tested?

Unit tests for optimiser utils

Are there any user-facing changes?

No

theirix added 2 commits July 20, 2025 17:04
Add methods distance, new_zero, new_one, new_ten for Decimal128,
Decimal256
@github-actions github-actions bot added optimizer Optimizer rules common Related to common crate labels Jul 20, 2025
@theirix theirix marked this pull request as ready for review July 20, 2025 20:23
@@ -1382,6 +1382,12 @@ impl ScalarValue {
DataType::Float16 => ScalarValue::Float16(Some(f16::from_f32(1.0))),
DataType::Float32 => ScalarValue::Float32(Some(1.0)),
DataType::Float64 => ScalarValue::Float64(Some(1.0)),
DataType::Decimal128(precision, scale) => {
ScalarValue::Decimal128(Some(1), *precision, *scale)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we create a new_one() for type Decimal128(3,3), the result in the natural scale will be 0.001:

("0.001", ScalarValue::Decimal128(Some(1), 3, 3)),

I think this function is supposed to construct 1 in the natural scale? So in this example it should be converted to Decimal128(Some(1000), 3, 3)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added support for scale, input verification and some tests. It should match Arrow's decimal semantics now.

use datafusion_common::{internal_err, Result, ScalarValue};
use datafusion_expr::{
expr::{Between, BinaryExpr, InList},
expr_fn::{and, bitwise_and, bitwise_or, or},
Expr, Like, Operator,
};

pub static POWS_OF_TEN: [i128; 38] = [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this lookup table is used for performance? We can do some measurements to check if it's useful.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me conduct some tests. It could also have been introduced for clarity, too.

i128 and i256 pow are not hardware-backed (with i256 introducing non-trivial low/high logic), so it's probably better to precompute lists via a const function.

@@ -1400,6 +1406,12 @@ impl ScalarValue {
DataType::Float16 => ScalarValue::Float16(Some(f16::from_f32(-1.0))),
DataType::Float32 => ScalarValue::Float32(Some(-1.0)),
DataType::Float64 => ScalarValue::Float64(Some(-1.0)),
DataType::Decimal128(precision, scale) => {
ScalarValue::Decimal128(Some(-1), *precision, *scale)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Self::Decimal256(Some(r), rprecision, rscale),
) => {
if lprecision == rprecision && lscale == rscale {
// l.checked_sub(*r).and_then( |v| v.checked_abs() ).and_then(|v| v.to_usize() )
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove

Self::Decimal128(Some(r), rprecision, rscale),
) => {
if lprecision == rprecision && lscale == rscale {
l.checked_sub(*r)?.abs().to_usize()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to_usize returns None on overflow.
shouldn't we return None when checked_sub overflows too?

@@ -168,10 +133,17 @@ pub fn is_one(s: &Expr) -> bool {
Expr::Literal(ScalarValue::Float64(Some(v)), _) if *v == 1. => true,
Expr::Literal(ScalarValue::Decimal128(Some(v), _p, s), _) => {
*s >= 0
&& POWS_OF_TEN
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why were powers of 10 precomputed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the initial idea is to mirror Arrow's approach https://github.com/apache/arrow-rs/blob/123045cc766d42d1eb06ee8bb3f09e39ea995ddc/arrow-data/src/decimal.rs

i128::pow and i256::pow have logarithmic complexity depending on the argument (scale in our case), which is usually low. The precomputed array lookup is surely done in constant time.

My other idea about const function to precalculate this array works only for i128 since its methods are consts, which is not the case for arrow-buffer's i256. So, the const function cannot be written without tinkering with from_parts manipulations.

const fn calculate_pows_of_ten_decimal128() -> [i128; DECIMAL128_MAX_PRECISION as usize] {
    let mut result = [0i128; DECIMAL128_MAX_PRECISION as usize];
    result[0] = 1;
    let mut i = 0;
    while i <(DECIMAL128_MAX_PRECISION-1) as usize {
        result[i+1] = result[i] * 10;
        i += 1
    }
    result
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe since we don't have measurements one way or the other to justfy this change, we revert this change and keep the original approach?

Other than this particular change, this PR looks good to me

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rolled back to the original lookup map. The new calculation method is used only for Decimal256.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have checked the lookup table approach is faster, perhaps it's better to implement such table in Arrow instead.

fn bench_println(c: &mut Criterion) {
    c.bench_function("pow-lookup-table", |b| {
        b.iter(|| {
            let precision = 30;
            let max_scale = 25;

            for s in 1..max_scale {
                is_one(&lit(ScalarValue::Decimal128(
                    Some(i128::from(1)),
                    precision,
                    max_scale,
                )));
            }
        })
    });
   // Decimal256 doesn't have a pre-computed power table
    c.bench_function("pow-with-calculation", |b| {
        b.iter(|| {
            let precision = 30;
            let max_scale = 25;

            for s in 1..max_scale {
                is_one(&lit(ScalarValue::Decimal256(
                    Some(i256::from(1)),
                    precision,
                    max_scale,
                )));
            }
        })
    });
}
pow-lookup-table        time:   [159.16 ns 161.86 ns 166.03 ns]
                        change: [-0.9307% +0.0846% +1.1886%] (p = 0.90 > 0.05)
                        No change in performance detected.
Found 10 outliers among 100 measurements (10.00%)
  5 (5.00%) low mild
  2 (2.00%) high mild
  3 (3.00%) high severe

pow-with-calculation    time:   [673.14 ns 674.23 ns 675.36 ns]
                        change: [-0.3838% -0.1634% +0.0709%] (p = 0.18 > 0.05)
                        No change in performance detected.
Found 4 outliers among 100 measurements (4.00%)
  1 (1.00%) low mild
  3 (3.00%) high mild

- Allow to construct one and ten with different scales
- Add tests for new_one, new_ten
- Add test for distance
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @theirix and @findepi and @2010YOUY01

@@ -1382,6 +1383,38 @@ impl ScalarValue {
DataType::Float16 => ScalarValue::Float16(Some(f16::from_f32(1.0))),
DataType::Float32 => ScalarValue::Float32(Some(1.0)),
DataType::Float64 => ScalarValue::Float64(Some(1.0)),
DataType::Decimal128(precision, scale) => {
if let Err(err) = validate_decimal_precision_and_scale::<Decimal128Type>(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reason to add new InternalError wrappers here?

As in why not just

Suggested change
if let Err(err) = validate_decimal_precision_and_scale::<Decimal128Type>(
validate_decimal_precision_and_scale::<Decimal128Type>(*precision, *scale)?;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, no need for it, updated. Forgot about the auto-conversion from ArrowError.

@@ -168,10 +133,17 @@ pub fn is_one(s: &Expr) -> bool {
Expr::Literal(ScalarValue::Float64(Some(v)), _) if *v == 1. => true,
Expr::Literal(ScalarValue::Decimal128(Some(v), _p, s), _) => {
*s >= 0
&& POWS_OF_TEN
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe since we don't have measurements one way or the other to justfy this change, we revert this change and keep the original approach?

Other than this particular change, this PR looks good to me

@alamb alamb changed the title feat: enhance support for Decimal128 and Decimal256 feat: Add ScalarValue::{new_one,new_zero,new_ten,distance} support for Decimal128 and Decimal256 Jul 24, 2025
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @theirix - this makes sense to me

cc @berkaysynnada as you have been working on this recently as well I think

@2010YOUY01 2010YOUY01 merged commit 4b9a468 into apache:main Jul 27, 2025
27 checks passed
adriangb pushed a commit to pydantic/datafusion that referenced this pull request Jul 28, 2025
…for `Decimal128` and `Decimal256` (apache#16831)

* Add missing ScalarValue impls for large decimals

Add methods distance, new_zero, new_one, new_ten for Decimal128,
Decimal256

* Support expr simplication for Decimal256

* Replace lookup table with i128::pow

* Support different scales for Decimal constructors

- Allow to construct one and ten with different scales
- Add tests for new_one, new_ten
- Add test for distance

* Revert "Replace lookup table with i128::pow"

This reverts commit ba23e8c.

* Use Arrow error directly
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
common Related to common crate optimizer Optimizer rules
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Enhance support for types in ScalarValue
4 participants