-
Notifications
You must be signed in to change notification settings - Fork 1.6k
feat: Support log for Decimal128 and Decimal256 #17023
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: theirix <theirix@gmail.com>
Numeric(1), | ||
Numeric(2), | ||
Exact(vec![DataType::Float32]), | ||
Exact(vec![DataType::Float64]), | ||
Exact(vec![DataType::Float32, DataType::Float32]), | ||
Exact(vec![DataType::Float64, DataType::Float64]), | ||
Exact(vec![ | ||
DataType::Int64, | ||
DataType::Decimal128(DECIMAL128_MAX_PRECISION, 0), | ||
]), | ||
Exact(vec![ | ||
DataType::Float32, | ||
DataType::Decimal128(DECIMAL128_MAX_PRECISION, 0), | ||
]), | ||
Exact(vec![ | ||
DataType::Float64, | ||
DataType::Decimal128(DECIMAL128_MAX_PRECISION, 0), | ||
]), | ||
Exact(vec![ | ||
DataType::Int64, | ||
DataType::Decimal256(DECIMAL256_MAX_PRECISION, 0), | ||
]), | ||
Exact(vec![ | ||
DataType::Float32, | ||
DataType::Decimal256(DECIMAL256_MAX_PRECISION, 0), | ||
]), | ||
Exact(vec![ | ||
DataType::Float64, | ||
DataType::Decimal256(DECIMAL256_MAX_PRECISION, 0), | ||
]), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are the Exact(...)
signatures superseded by the Numeric(1)
and Numeric(2)
? Considering integers, floats and decimals are considered numeric types?
if !base.is_finite() || base.trunc() != base || (base as u32) < 2 { | ||
Err(ArrowError::ComputeError(format!( | ||
"Log cannot use non-integer or small base {base}" | ||
))) | ||
} else { | ||
let unscaled_value = decimal128_to_i128(value, scale)?; | ||
if unscaled_value > 0 { | ||
let log_value: u32 = unscaled_value.ilog(base as i128); | ||
Ok(log_value as f64) | ||
} else { | ||
// Reflect f64::log behaviour | ||
Ok(f64::NAN) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if !base.is_finite() || base.trunc() != base || (base as u32) < 2 { | |
Err(ArrowError::ComputeError(format!( | |
"Log cannot use non-integer or small base {base}" | |
))) | |
} else { | |
let unscaled_value = decimal128_to_i128(value, scale)?; | |
if unscaled_value > 0 { | |
let log_value: u32 = unscaled_value.ilog(base as i128); | |
Ok(log_value as f64) | |
} else { | |
// Reflect f64::log behaviour | |
Ok(f64::NAN) | |
} | |
} | |
if !base.is_finite() || base.trunc() != base { | |
return Err(ArrowError::ComputeError(format!( | |
"Log cannot use non-integer base: {base}" | |
))); | |
} | |
if (base as u32) < 2 { | |
return Err(ArrowError::ComputeError(format!( | |
"Log base must be greater than 1: {base}" | |
))); | |
} | |
let unscaled_value = decimal128_to_i128(value, scale)?; | |
if unscaled_value > 0 { | |
let log_value: u32 = unscaled_value.ilog(base as i128); | |
Ok(log_value as f64) | |
} else { | |
// Reflect f64::log behaviour | |
Ok(f64::NAN) | |
} |
Just to clarify the errors a bit
if !base.is_finite() || base.trunc() != base || (base as u32) < 2 { | ||
Err(ArrowError::ComputeError(format!( | ||
"Log cannot use non-integer or small base {base}" | ||
))) | ||
} else { | ||
match value.to_i128() { | ||
Some(short_value) => { | ||
// Calculate logarithm only for 128-bit decimals | ||
let unscaled_value = decimal128_to_i128(short_value, scale)?; | ||
if unscaled_value > 0 { | ||
let log_value: u32 = unscaled_value.ilog(base as i128); | ||
Ok(log_value as f64) | ||
} else { | ||
Ok(f64::NAN) | ||
} | ||
} | ||
None => Err(ArrowError::ComputeError(format!( | ||
"Log of a large Decimal256 is not supported: {value}" | ||
))), | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if !base.is_finite() || base.trunc() != base || (base as u32) < 2 { | |
Err(ArrowError::ComputeError(format!( | |
"Log cannot use non-integer or small base {base}" | |
))) | |
} else { | |
match value.to_i128() { | |
Some(short_value) => { | |
// Calculate logarithm only for 128-bit decimals | |
let unscaled_value = decimal128_to_i128(short_value, scale)?; | |
if unscaled_value > 0 { | |
let log_value: u32 = unscaled_value.ilog(base as i128); | |
Ok(log_value as f64) | |
} else { | |
Ok(f64::NAN) | |
} | |
} | |
None => Err(ArrowError::ComputeError(format!( | |
"Log of a large Decimal256 is not supported: {value}" | |
))), | |
} | |
} | |
match value.to_i128() { | |
Some(value) => log_decimal128(value, scale, base), | |
None => Err(ArrowError::NotYetImplemented(format!( | |
"Log of Decimal256 larger than Decimal128 is not yet supported: {value}" | |
))), | |
} |
Thoughts on reusing existing function above? Also changing error type as NotYetImplemented
seems more appropriate (though maybe should use the DataFusion version instead of Arrow version 🤔 )
&DataType::Float64 | ||
} else { | ||
args[0].data_type() | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this logic match the above return_type()
function?
Which issue does this PR close?
Rationale for this change
Add Decimal128 and Decimal256 support for
log
UDF.It's a most generic function, allowing for specifying a logarithm base, but by default it is
log10
, which makes it a good candidate for long decimals. Thecalculate_binary_math
helper could simplify a lot of UDFs (a subject of future PRs).Since decimals only support integer logarithms, the result is rounded and then converted back to a float, but it is still done on the
i128
/i256
level, not by rounding input parameters as before.Also, if numbers are parsed as floats, there is precision lost due to floating-point handling. So the majority of tests are targeting
parse_float_as_decimal=true
as in #14612. Otherwise, the log result could differ by one or two due to rounding – see regression SLT.Notably, we still miss math for 256-bits. Arrow's i256 type uses the [BigInt implementation], which could provide log10 at least, but we can extend decimal arithmetic in Arrow as well.
What changes are included in this PR?
ScalarValue::{new_one,new_zero,new_ten,distance}
support forDecimal128
andDecimal256
#16831 for zero scalePlease note, there are more specialised functions
log2
,log10
,ln
(), which are not handled byLogFunc
. They could be migrated to this UDF later, providing a base value explicitly.Are these changes tested?
Are there any user-facing changes?
No, except the precision of log calculation can increase for float inputs (since f64 is used). For decimals, results could become more accurate since no float downcasting is involved.