Skip to content

Conversation

theirix
Copy link
Contributor

@theirix theirix commented Aug 3, 2025

Which issue does this PR close?

Rationale for this change

Add Decimal128 and Decimal256 support for log UDF.
It's a most generic function, allowing for specifying a logarithm base, but by default it is log10, which makes it a good candidate for long decimals. The calculate_binary_math helper could simplify a lot of UDFs (a subject of future PRs).

Since decimals only support integer logarithms, the result is rounded and then converted back to a float, but it is still done on the i128/i256 level, not by rounding input parameters as before.

Also, if numbers are parsed as floats, there is precision lost due to floating-point handling. So the majority of tests are targeting parse_float_as_decimal=true as in #14612. Otherwise, the log result could differ by one or two due to rounding – see regression SLT.

Notably, we still miss math for 256-bits. Arrow's i256 type uses the [BigInt implementation], which could provide log10 at least, but we can extend decimal arithmetic in Arrow as well.

What changes are included in this PR?

Please note, there are more specialised functions log2, log10, ln (), which are not handled by LogFunc. They could be migrated to this UDF later, providing a base value explicitly.

Are these changes tested?

  • Unit tests
  • Regression SLT tests
  • Manual invocation of datafusion-cli
  • Manual comparison of results to other large decimal math implementations (Python, WolframAlpha)

Are there any user-facing changes?

No, except the precision of log calculation can increase for float inputs (since f64 is used). For decimals, results could become more accurate since no float downcasting is involved.

@github-actions github-actions bot added sqllogictest SQL Logic Tests (.slt) common Related to common crate functions Changes to functions implementation labels Aug 3, 2025
@theirix theirix marked this pull request as ready for review August 10, 2025 07:23
Comment on lines +70 to +99
Numeric(1),
Numeric(2),
Exact(vec![DataType::Float32]),
Exact(vec![DataType::Float64]),
Exact(vec![DataType::Float32, DataType::Float32]),
Exact(vec![DataType::Float64, DataType::Float64]),
Exact(vec![
DataType::Int64,
DataType::Decimal128(DECIMAL128_MAX_PRECISION, 0),
]),
Exact(vec![
DataType::Float32,
DataType::Decimal128(DECIMAL128_MAX_PRECISION, 0),
]),
Exact(vec![
DataType::Float64,
DataType::Decimal128(DECIMAL128_MAX_PRECISION, 0),
]),
Exact(vec![
DataType::Int64,
DataType::Decimal256(DECIMAL256_MAX_PRECISION, 0),
]),
Exact(vec![
DataType::Float32,
DataType::Decimal256(DECIMAL256_MAX_PRECISION, 0),
]),
Exact(vec![
DataType::Float64,
DataType::Decimal256(DECIMAL256_MAX_PRECISION, 0),
]),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the Exact(...) signatures superseded by the Numeric(1) and Numeric(2)? Considering integers, floats and decimals are considered numeric types?

Comment on lines +110 to +123
if !base.is_finite() || base.trunc() != base || (base as u32) < 2 {
Err(ArrowError::ComputeError(format!(
"Log cannot use non-integer or small base {base}"
)))
} else {
let unscaled_value = decimal128_to_i128(value, scale)?;
if unscaled_value > 0 {
let log_value: u32 = unscaled_value.ilog(base as i128);
Ok(log_value as f64)
} else {
// Reflect f64::log behaviour
Ok(f64::NAN)
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if !base.is_finite() || base.trunc() != base || (base as u32) < 2 {
Err(ArrowError::ComputeError(format!(
"Log cannot use non-integer or small base {base}"
)))
} else {
let unscaled_value = decimal128_to_i128(value, scale)?;
if unscaled_value > 0 {
let log_value: u32 = unscaled_value.ilog(base as i128);
Ok(log_value as f64)
} else {
// Reflect f64::log behaviour
Ok(f64::NAN)
}
}
if !base.is_finite() || base.trunc() != base {
return Err(ArrowError::ComputeError(format!(
"Log cannot use non-integer base: {base}"
)));
}
if (base as u32) < 2 {
return Err(ArrowError::ComputeError(format!(
"Log base must be greater than 1: {base}"
)));
}
let unscaled_value = decimal128_to_i128(value, scale)?;
if unscaled_value > 0 {
let log_value: u32 = unscaled_value.ilog(base as i128);
Ok(log_value as f64)
} else {
// Reflect f64::log behaviour
Ok(f64::NAN)
}

Just to clarify the errors a bit

Comment on lines +129 to +149
if !base.is_finite() || base.trunc() != base || (base as u32) < 2 {
Err(ArrowError::ComputeError(format!(
"Log cannot use non-integer or small base {base}"
)))
} else {
match value.to_i128() {
Some(short_value) => {
// Calculate logarithm only for 128-bit decimals
let unscaled_value = decimal128_to_i128(short_value, scale)?;
if unscaled_value > 0 {
let log_value: u32 = unscaled_value.ilog(base as i128);
Ok(log_value as f64)
} else {
Ok(f64::NAN)
}
}
None => Err(ArrowError::ComputeError(format!(
"Log of a large Decimal256 is not supported: {value}"
))),
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if !base.is_finite() || base.trunc() != base || (base as u32) < 2 {
Err(ArrowError::ComputeError(format!(
"Log cannot use non-integer or small base {base}"
)))
} else {
match value.to_i128() {
Some(short_value) => {
// Calculate logarithm only for 128-bit decimals
let unscaled_value = decimal128_to_i128(short_value, scale)?;
if unscaled_value > 0 {
let log_value: u32 = unscaled_value.ilog(base as i128);
Ok(log_value as f64)
} else {
Ok(f64::NAN)
}
}
None => Err(ArrowError::ComputeError(format!(
"Log of a large Decimal256 is not supported: {value}"
))),
}
}
match value.to_i128() {
Some(value) => log_decimal128(value, scale, base),
None => Err(ArrowError::NotYetImplemented(format!(
"Log of Decimal256 larger than Decimal128 is not yet supported: {value}"
))),
}

Thoughts on reusing existing function above? Also changing error type as NotYetImplemented seems more appropriate (though maybe should use the DataFusion version instead of Arrow version 🤔 )

Comment on lines +207 to +210
&DataType::Float64
} else {
args[0].data_type()
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this logic match the above return_type() function?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
common Related to common crate functions Changes to functions implementation sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

feat: support decimal for math functions: log
2 participants