-
Notifications
You must be signed in to change notification settings - Fork 1.6k
feat: Add ScalarValue::{new_one,new_zero,new_ten,distance}
support for Decimal128
and Decimal256
#16831
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Add methods distance, new_zero, new_one, new_ten for Decimal128, Decimal256
datafusion/common/src/scalar/mod.rs
Outdated
@@ -1382,6 +1382,12 @@ impl ScalarValue { | |||
DataType::Float16 => ScalarValue::Float16(Some(f16::from_f32(1.0))), | |||
DataType::Float32 => ScalarValue::Float32(Some(1.0)), | |||
DataType::Float64 => ScalarValue::Float64(Some(1.0)), | |||
DataType::Decimal128(precision, scale) => { | |||
ScalarValue::Decimal128(Some(1), *precision, *scale) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we create a new_one()
for type Decimal128(3,3)
, the result in the natural scale will be 0.001:
datafusion/datafusion/sql/src/expr/value.rs
Line 467 in 3869857
("0.001", ScalarValue::Decimal128(Some(1), 3, 3)), |
I think this function is supposed to construct 1 in the natural scale? So in this example it should be converted to
Decimal128(Some(1000), 3, 3)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added support for scale, input verification and some tests. It should match Arrow's decimal semantics now.
use datafusion_common::{internal_err, Result, ScalarValue}; | ||
use datafusion_expr::{ | ||
expr::{Between, BinaryExpr, InList}, | ||
expr_fn::{and, bitwise_and, bitwise_or, or}, | ||
Expr, Like, Operator, | ||
}; | ||
|
||
pub static POWS_OF_TEN: [i128; 38] = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this lookup table is used for performance? We can do some measurements to check if it's useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me conduct some tests. It could also have been introduced for clarity, too.
i128
and i256
pow are not hardware-backed (with i256
introducing non-trivial low/high logic), so it's probably better to precompute lists via a const
function.
datafusion/common/src/scalar/mod.rs
Outdated
@@ -1400,6 +1406,12 @@ impl ScalarValue { | |||
DataType::Float16 => ScalarValue::Float16(Some(f16::from_f32(-1.0))), | |||
DataType::Float32 => ScalarValue::Float32(Some(-1.0)), | |||
DataType::Float64 => ScalarValue::Float64(Some(-1.0)), | |||
DataType::Decimal128(precision, scale) => { | |||
ScalarValue::Decimal128(Some(-1), *precision, *scale) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
datafusion/common/src/scalar/mod.rs
Outdated
Self::Decimal256(Some(r), rprecision, rscale), | ||
) => { | ||
if lprecision == rprecision && lscale == rscale { | ||
// l.checked_sub(*r).and_then( |v| v.checked_abs() ).and_then(|v| v.to_usize() ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove
datafusion/common/src/scalar/mod.rs
Outdated
Self::Decimal128(Some(r), rprecision, rscale), | ||
) => { | ||
if lprecision == rprecision && lscale == rscale { | ||
l.checked_sub(*r)?.abs().to_usize() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to_usize returns None on overflow.
shouldn't we return None when checked_sub overflows too?
@@ -168,10 +133,17 @@ pub fn is_one(s: &Expr) -> bool { | |||
Expr::Literal(ScalarValue::Float64(Some(v)), _) if *v == 1. => true, | |||
Expr::Literal(ScalarValue::Decimal128(Some(v), _p, s), _) => { | |||
*s >= 0 | |||
&& POWS_OF_TEN |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why were powers of 10 precomputed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the initial idea is to mirror Arrow's approach https://github.com/apache/arrow-rs/blob/123045cc766d42d1eb06ee8bb3f09e39ea995ddc/arrow-data/src/decimal.rs
i128::pow
and i256::pow
have logarithmic complexity depending on the argument (scale in our case), which is usually low. The precomputed array lookup is surely done in constant time.
My other idea about const function to precalculate this array works only for i128
since its methods are consts, which is not the case for arrow-buffer's i256. So, the const function cannot be written without tinkering with from_parts
manipulations.
const fn calculate_pows_of_ten_decimal128() -> [i128; DECIMAL128_MAX_PRECISION as usize] {
let mut result = [0i128; DECIMAL128_MAX_PRECISION as usize];
result[0] = 1;
let mut i = 0;
while i <(DECIMAL128_MAX_PRECISION-1) as usize {
result[i+1] = result[i] * 10;
i += 1
}
result
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe since we don't have measurements one way or the other to justfy this change, we revert this change and keep the original approach?
Other than this particular change, this PR looks good to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rolled back to the original lookup map. The new calculation method is used only for Decimal256.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have checked the lookup table approach is faster, perhaps it's better to implement such table in Arrow instead.
fn bench_println(c: &mut Criterion) {
c.bench_function("pow-lookup-table", |b| {
b.iter(|| {
let precision = 30;
let max_scale = 25;
for s in 1..max_scale {
is_one(&lit(ScalarValue::Decimal128(
Some(i128::from(1)),
precision,
max_scale,
)));
}
})
});
// Decimal256 doesn't have a pre-computed power table
c.bench_function("pow-with-calculation", |b| {
b.iter(|| {
let precision = 30;
let max_scale = 25;
for s in 1..max_scale {
is_one(&lit(ScalarValue::Decimal256(
Some(i256::from(1)),
precision,
max_scale,
)));
}
})
});
}
pow-lookup-table time: [159.16 ns 161.86 ns 166.03 ns]
change: [-0.9307% +0.0846% +1.1886%] (p = 0.90 > 0.05)
No change in performance detected.
Found 10 outliers among 100 measurements (10.00%)
5 (5.00%) low mild
2 (2.00%) high mild
3 (3.00%) high severe
pow-with-calculation time: [673.14 ns 674.23 ns 675.36 ns]
change: [-0.3838% -0.1634% +0.0709%] (p = 0.18 > 0.05)
No change in performance detected.
Found 4 outliers among 100 measurements (4.00%)
1 (1.00%) low mild
3 (3.00%) high mild
- Allow to construct one and ten with different scales - Add tests for new_one, new_ten - Add test for distance
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @theirix and @findepi and @2010YOUY01
datafusion/common/src/scalar/mod.rs
Outdated
@@ -1382,6 +1383,38 @@ impl ScalarValue { | |||
DataType::Float16 => ScalarValue::Float16(Some(f16::from_f32(1.0))), | |||
DataType::Float32 => ScalarValue::Float32(Some(1.0)), | |||
DataType::Float64 => ScalarValue::Float64(Some(1.0)), | |||
DataType::Decimal128(precision, scale) => { | |||
if let Err(err) = validate_decimal_precision_and_scale::<Decimal128Type>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the reason to add new InternalError wrappers here?
As in why not just
if let Err(err) = validate_decimal_precision_and_scale::<Decimal128Type>( | |
validate_decimal_precision_and_scale::<Decimal128Type>(*precision, *scale)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, no need for it, updated. Forgot about the auto-conversion from ArrowError
.
@@ -168,10 +133,17 @@ pub fn is_one(s: &Expr) -> bool { | |||
Expr::Literal(ScalarValue::Float64(Some(v)), _) if *v == 1. => true, | |||
Expr::Literal(ScalarValue::Decimal128(Some(v), _p, s), _) => { | |||
*s >= 0 | |||
&& POWS_OF_TEN |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe since we don't have measurements one way or the other to justfy this change, we revert this change and keep the original approach?
Other than this particular change, this PR looks good to me
ScalarValue::{new_one,new_zero,new_ten,distance}
support for Decimal128
and Decimal256
This reverts commit ba23e8c.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @theirix - this makes sense to me
cc @berkaysynnada as you have been working on this recently as well I think
…for `Decimal128` and `Decimal256` (apache#16831) * Add missing ScalarValue impls for large decimals Add methods distance, new_zero, new_one, new_ten for Decimal128, Decimal256 * Support expr simplication for Decimal256 * Replace lookup table with i128::pow * Support different scales for Decimal constructors - Allow to construct one and ten with different scales - Add tests for new_one, new_ten - Add test for distance * Revert "Replace lookup table with i128::pow" This reverts commit ba23e8c. * Use Arrow error directly
Which issue does this PR close?
Rationale for this change
Enhancing support for
ScalarValue::Decimal128
andScalarValue::Decimal256
What changes are included in this PR?
Decimal128
andDecimal256
toScalarValue
in functions:new_one
,new_zero
,new_ten
,distance
Decimal256
:is_zero
andis_one
Are these changes tested?
Unit tests for optimiser utils
Are there any user-facing changes?
No