feat: Add `ScalarValue::{new_one,new_zero,new_ten,distance}` support for `Decimal128` and `Decimal256` #16831

theirix · 2025-07-20T19:46:00Z

Which issue does this PR close?

Closes Enhance support for types in ScalarValue #16832

Rationale for this change

Enhancing support for ScalarValue::Decimal128 and ScalarValue::Decimal256

What changes are included in this PR?

Add support for Decimal128 and Decimal256 to ScalarValue in functions: new_one, new_zero, new_ten, distance
Support simplifcation optimiser rule for Decimal256: is_zero and is_one

Are these changes tested?

Unit tests for optimiser utils

Are there any user-facing changes?

No

Add methods distance, new_zero, new_one, new_ten for Decimal128, Decimal256

2010YOUY01 · 2025-07-21T04:16:28Z

datafusion/common/src/scalar/mod.rs

@@ -1382,6 +1382,12 @@ impl ScalarValue {
            DataType::Float16 => ScalarValue::Float16(Some(f16::from_f32(1.0))),
            DataType::Float32 => ScalarValue::Float32(Some(1.0)),
            DataType::Float64 => ScalarValue::Float64(Some(1.0)),
+            DataType::Decimal128(precision, scale) => {
+                ScalarValue::Decimal128(Some(1), *precision, *scale)


If we create a new_one() for type Decimal128(3,3), the result in the natural scale will be 0.001:

datafusion/datafusion/sql/src/expr/value.rs

Line 467 in 3869857

("0.001", ScalarValue::Decimal128(Some(1), 3, 3)),

I think this function is supposed to construct 1 in the natural scale? So in this example it should be converted to Decimal128(Some(1000), 3, 3)?

Added support for scale, input verification and some tests. It should match Arrow's decimal semantics now.

2010YOUY01 · 2025-07-21T04:17:29Z

datafusion/optimizer/src/simplify_expressions/utils.rs

 use datafusion_common::{internal_err, Result, ScalarValue};
 use datafusion_expr::{
    expr::{Between, BinaryExpr, InList},
    expr_fn::{and, bitwise_and, bitwise_or, or},
    Expr, Like, Operator,
 };

-pub static POWS_OF_TEN: [i128; 38] = [


I guess this lookup table is used for performance? We can do some measurements to check if it's useful.

Let me conduct some tests. It could also have been introduced for clarity, too.

i128 and i256 pow are not hardware-backed (with i256 introducing non-trivial low/high logic), so it's probably better to precompute lists via a const function.

findepi · 2025-07-22T06:47:58Z

datafusion/common/src/scalar/mod.rs

@@ -1400,6 +1406,12 @@ impl ScalarValue {
            DataType::Float16 => ScalarValue::Float16(Some(f16::from_f32(-1.0))),
            DataType::Float32 => ScalarValue::Float32(Some(-1.0)),
            DataType::Float64 => ScalarValue::Float64(Some(-1.0)),
+            DataType::Decimal128(precision, scale) => {
+                ScalarValue::Decimal128(Some(-1), *precision, *scale)


same as https://github.com/apache/datafusion/pull/16831/files#r2218150407

findepi · 2025-07-22T06:49:46Z

datafusion/common/src/scalar/mod.rs

+                Self::Decimal256(Some(r), rprecision, rscale),
+            ) => {
+                if lprecision == rprecision && lscale == rscale {
+                    // l.checked_sub(*r).and_then( |v| v.checked_abs() ).and_then(|v| v.to_usize() )


datafusion/common/src/scalar/mod.rs

findepi · 2025-07-22T06:51:31Z

datafusion/common/src/scalar/mod.rs

+                Self::Decimal128(Some(r), rprecision, rscale),
+            ) => {
+                if lprecision == rprecision && lscale == rscale {
+                    l.checked_sub(*r)?.abs().to_usize()


to_usize returns None on overflow.
shouldn't we return None when checked_sub overflows too?

findepi · 2025-07-22T06:52:39Z

datafusion/optimizer/src/simplify_expressions/utils.rs

@@ -168,10 +133,17 @@ pub fn is_one(s: &Expr) -> bool {
        Expr::Literal(ScalarValue::Float64(Some(v)), _) if *v == 1. => true,
        Expr::Literal(ScalarValue::Decimal128(Some(v), _p, s), _) => {
            *s >= 0
-                && POWS_OF_TEN


Why were powers of 10 precomputed?

I think the initial idea is to mirror Arrow's approach https://github.com/apache/arrow-rs/blob/123045cc766d42d1eb06ee8bb3f09e39ea995ddc/arrow-data/src/decimal.rs

i128::pow and i256::pow have logarithmic complexity depending on the argument (scale in our case), which is usually low. The precomputed array lookup is surely done in constant time.

My other idea about const function to precalculate this array works only for i128 since its methods are consts, which is not the case for arrow-buffer's i256. So, the const function cannot be written without tinkering with from_parts manipulations.

const fn calculate_pows_of_ten_decimal128() -> [i128; DECIMAL128_MAX_PRECISION as usize] { let mut result = [0i128; DECIMAL128_MAX_PRECISION as usize]; result[0] = 1; let mut i = 0; while i <(DECIMAL128_MAX_PRECISION-1) as usize { result[i+1] = result[i] * 10; i += 1 } result }

maybe since we don't have measurements one way or the other to justfy this change, we revert this change and keep the original approach?

Other than this particular change, this PR looks good to me

Rolled back to the original lookup map. The new calculation method is used only for Decimal256.

I have checked the lookup table approach is faster, perhaps it's better to implement such table in Arrow instead.

fn bench_println(c: &mut Criterion) { c.bench_function("pow-lookup-table", |b| { b.iter(|| { let precision = 30; let max_scale = 25; for s in 1..max_scale { is_one(&lit(ScalarValue::Decimal128( Some(i128::from(1)), precision, max_scale, ))); } }) }); // Decimal256 doesn't have a pre-computed power table c.bench_function("pow-with-calculation", |b| { b.iter(|| { let precision = 30; let max_scale = 25; for s in 1..max_scale { is_one(&lit(ScalarValue::Decimal256( Some(i256::from(1)), precision, max_scale, ))); } }) }); }

pow-lookup-table time: [159.16 ns 161.86 ns 166.03 ns] change: [-0.9307% +0.0846% +1.1886%] (p = 0.90 > 0.05) No change in performance detected. Found 10 outliers among 100 measurements (10.00%) 5 (5.00%) low mild 2 (2.00%) high mild 3 (3.00%) high severe pow-with-calculation time: [673.14 ns 674.23 ns 675.36 ns] change: [-0.3838% -0.1634% +0.0709%] (p = 0.18 > 0.05) No change in performance detected. Found 4 outliers among 100 measurements (4.00%) 1 (1.00%) low mild 3 (3.00%) high mild

- Allow to construct one and ten with different scales - Add tests for new_one, new_ten - Add test for distance

alamb

Thanks @theirix and @findepi and @2010YOUY01

alamb · 2025-07-24T13:08:29Z

datafusion/common/src/scalar/mod.rs

@@ -1382,6 +1383,38 @@ impl ScalarValue {
            DataType::Float16 => ScalarValue::Float16(Some(f16::from_f32(1.0))),
            DataType::Float32 => ScalarValue::Float32(Some(1.0)),
            DataType::Float64 => ScalarValue::Float64(Some(1.0)),
+            DataType::Decimal128(precision, scale) => {
+                if let Err(err) = validate_decimal_precision_and_scale::<Decimal128Type>(


What is the reason to add new InternalError wrappers here?

As in why not just

Suggested change

if let Err(err) = validate_decimal_precision_and_scale::<Decimal128Type>(

validate_decimal_precision_and_scale::<Decimal128Type>(*precision, *scale)?;

Agree, no need for it, updated. Forgot about the auto-conversion from ArrowError.

alamb · 2025-07-24T13:11:04Z

datafusion/optimizer/src/simplify_expressions/utils.rs

@@ -168,10 +133,17 @@ pub fn is_one(s: &Expr) -> bool {
        Expr::Literal(ScalarValue::Float64(Some(v)), _) if *v == 1. => true,
        Expr::Literal(ScalarValue::Decimal128(Some(v), _p, s), _) => {
            *s >= 0
-                && POWS_OF_TEN


maybe since we don't have measurements one way or the other to justfy this change, we revert this change and keep the original approach?

Other than this particular change, this PR looks good to me

This reverts commit ba23e8c.

alamb

Thanks @theirix - this makes sense to me

cc @berkaysynnada as you have been working on this recently as well I think

…for `Decimal128` and `Decimal256` (apache#16831) * Add missing ScalarValue impls for large decimals Add methods distance, new_zero, new_one, new_ten for Decimal128, Decimal256 * Support expr simplication for Decimal256 * Replace lookup table with i128::pow * Support different scales for Decimal constructors - Allow to construct one and ten with different scales - Add tests for new_one, new_ten - Add test for distance * Revert "Replace lookup table with i128::pow" This reverts commit ba23e8c. * Use Arrow error directly

theirix added 2 commits July 20, 2025 17:04

Add missing ScalarValue impls for large decimals

459f76d

Add methods distance, new_zero, new_one, new_ten for Decimal128, Decimal256

Support expr simplication for Decimal256

bf3a35b

github-actions bot added optimizer Optimizer rules common Related to common crate labels Jul 20, 2025

Replace lookup table with i128::pow

ba23e8c

theirix marked this pull request as ready for review July 20, 2025 20:23

2010YOUY01 reviewed Jul 21, 2025

View reviewed changes

findepi reviewed Jul 22, 2025

View reviewed changes

Support different scales for Decimal constructors

15eb1b0

- Allow to construct one and ten with different scales - Add tests for new_one, new_ten - Add test for distance

alamb reviewed Jul 24, 2025

View reviewed changes

alamb changed the title ~~feat: enhance support for Decimal128 and Decimal256~~ feat: Add ScalarValue::{new_one,new_zero,new_ten,distance} support for Decimal128 and Decimal256 Jul 24, 2025

theirix added 2 commits July 24, 2025 17:47

Revert "Replace lookup table with i128::pow"

06551a7

This reverts commit ba23e8c.

Use Arrow error directly

c536f50

alamb approved these changes Jul 24, 2025

View reviewed changes

2010YOUY01 merged commit 4b9a468 into apache:main Jul 27, 2025
27 checks passed

theirix mentioned this pull request Aug 3, 2025

feat: Support log for Decimal128 and Decimal256 #17023

Open

	if let Err(err) = validate_decimal_precision_and_scale::<Decimal128Type>(
	validate_decimal_precision_and_scale::<Decimal128Type>(precision, scale)?;

feat: Add ScalarValue::{new_one,new_zero,new_ten,distance} support for Decimal128 and Decimal256 #16831

feat: Add ScalarValue::{new_one,new_zero,new_ten,distance} support for Decimal128 and Decimal256 #16831

Uh oh!

Conversation

theirix commented Jul 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

feat: Add `ScalarValue::{new_one,new_zero,new_ten,distance}` support for `Decimal128` and `Decimal256` #16831

feat: Add `ScalarValue::{new_one,new_zero,new_ten,distance}` support for `Decimal128` and `Decimal256` #16831

theirix commented Jul 20, 2025 •

edited

Loading