feat: Support log for Decimal128 and Decimal256 #17023

theirix · 2025-08-03T11:27:59Z

Which issue does this PR close?

Closes feat: support decimal for math functions: log #17140.

Rationale for this change

Add Decimal128 and Decimal256 support for log UDF.
It's a most generic function, allowing for specifying a logarithm base, but by default it is log10, which makes it a good candidate for long decimals. The calculate_binary_math helper could simplify a lot of UDFs (a subject of future PRs).

Since decimals only support integer logarithms, the result is rounded and then converted back to a float, but it is still done on the i128/i256 level, not by rounding input parameters as before.

Also, if numbers are parsed as floats, there is precision lost due to floating-point handling. So the majority of tests are targeting parse_float_as_decimal=true as in #14612. Otherwise, the log result could differ by one or two due to rounding – see regression SLT.

Notably, we still miss math for 256-bits. Arrow's i256 type uses the [BigInt implementation], which could provide log10 at least, but we can extend decimal arithmetic in Arrow as well.

What changes are included in this PR?

Support for decimals
A generic function for binary functors simplifying math UDF development
Additional unit and SLT tests
A minor follow-up fix to feat: Add ScalarValue::{new_one,new_zero,new_ten,distance} support for Decimal128 and Decimal256 #16831 for zero scale
Updated some tests to reflect default float64 calculation
[chore] Allow env logging in functions crate

Please note, there are more specialised functions log2, log10, ln (), which are not handled by LogFunc. They could be migrated to this UDF later, providing a base value explicitly.

Are these changes tested?

Unit tests
Regression SLT tests
Manual invocation of datafusion-cli
Manual comparison of results to other large decimal math implementations (Python, WolframAlpha)

Are there any user-facing changes?

No, except the precision of log calculation can increase for float inputs (since f64 is used). For decimals, results could become more accurate since no float downcasting is involved.

Signed-off-by: theirix <theirix@gmail.com>

Jefffrey · 2025-09-02T08:04:39Z

datafusion/functions/src/math/log.rs

+                    Numeric(1),
+                    Numeric(2),
+                    Exact(vec![DataType::Float32]),
+                    Exact(vec![DataType::Float64]),
+                    Exact(vec![DataType::Float32, DataType::Float32]),
+                    Exact(vec![DataType::Float64, DataType::Float64]),
+                    Exact(vec![
+                        DataType::Int64,
+                        DataType::Decimal128(DECIMAL128_MAX_PRECISION, 0),
+                    ]),
+                    Exact(vec![
+                        DataType::Float32,
+                        DataType::Decimal128(DECIMAL128_MAX_PRECISION, 0),
+                    ]),
+                    Exact(vec![
+                        DataType::Float64,
+                        DataType::Decimal128(DECIMAL128_MAX_PRECISION, 0),
+                    ]),
+                    Exact(vec![
+                        DataType::Int64,
+                        DataType::Decimal256(DECIMAL256_MAX_PRECISION, 0),
+                    ]),
+                    Exact(vec![
+                        DataType::Float32,
+                        DataType::Decimal256(DECIMAL256_MAX_PRECISION, 0),
+                    ]),
+                    Exact(vec![
+                        DataType::Float64,
+                        DataType::Decimal256(DECIMAL256_MAX_PRECISION, 0),
+                    ]),


Are the Exact(...) signatures superseded by the Numeric(1) and Numeric(2)? Considering integers, floats and decimals are considered numeric types?

Jefffrey · 2025-09-02T08:13:55Z

datafusion/functions/src/math/log.rs

+    if !base.is_finite() || base.trunc() != base || (base as u32) < 2 {
+        Err(ArrowError::ComputeError(format!(
+            "Log cannot use non-integer or small base {base}"
+        )))
+    } else {
+        let unscaled_value = decimal128_to_i128(value, scale)?;
+        if unscaled_value > 0 {
+            let log_value: u32 = unscaled_value.ilog(base as i128);
+            Ok(log_value as f64)
+        } else {
+            // Reflect f64::log behaviour
+            Ok(f64::NAN)
+        }
+    }


Suggested change

if !base.is_finite() || base.trunc() != base || (base as u32) < 2 {

Err(ArrowError::ComputeError(format!(

"Log cannot use non-integer or small base {base}"

)))

} else {

let unscaled_value = decimal128_to_i128(value, scale)?;

if unscaled_value > 0 {

let log_value: u32 = unscaled_value.ilog(base as i128);

Ok(log_value as f64)

} else {

// Reflect f64::log behaviour

Ok(f64::NAN)

}

}

if !base.is_finite() || base.trunc() != base {

return Err(ArrowError::ComputeError(format!(

"Log cannot use non-integer base: {base}"

)));

}

if (base as u32) < 2 {

return Err(ArrowError::ComputeError(format!(

"Log base must be greater than 1: {base}"

)));

}

let unscaled_value = decimal128_to_i128(value, scale)?;

if unscaled_value > 0 {

let log_value: u32 = unscaled_value.ilog(base as i128);

Ok(log_value as f64)

} else {

// Reflect f64::log behaviour

Ok(f64::NAN)

}

Just to clarify the errors a bit

Jefffrey · 2025-09-02T08:17:44Z

datafusion/functions/src/math/log.rs

+    if !base.is_finite() || base.trunc() != base || (base as u32) < 2 {
+        Err(ArrowError::ComputeError(format!(
+            "Log cannot use non-integer or small base {base}"
+        )))
+    } else {
+        match value.to_i128() {
+            Some(short_value) => {
+                // Calculate logarithm only for 128-bit decimals
+                let unscaled_value = decimal128_to_i128(short_value, scale)?;
+                if unscaled_value > 0 {
+                    let log_value: u32 = unscaled_value.ilog(base as i128);
+                    Ok(log_value as f64)
+                } else {
+                    Ok(f64::NAN)
+                }
+            }
+            None => Err(ArrowError::ComputeError(format!(
+                "Log of a large Decimal256 is not supported: {value}"
+            ))),
+        }
+    }


Suggested change

if !base.is_finite() || base.trunc() != base || (base as u32) < 2 {

Err(ArrowError::ComputeError(format!(

"Log cannot use non-integer or small base {base}"

)))

} else {

match value.to_i128() {

Some(short_value) => {

// Calculate logarithm only for 128-bit decimals

let unscaled_value = decimal128_to_i128(short_value, scale)?;

if unscaled_value > 0 {

let log_value: u32 = unscaled_value.ilog(base as i128);

Ok(log_value as f64)

} else {

Ok(f64::NAN)

}

}

None => Err(ArrowError::ComputeError(format!(

"Log of a large Decimal256 is not supported: {value}"

))),

}

}

match value.to_i128() {

Some(value) => log_decimal128(value, scale, base),

None => Err(ArrowError::NotYetImplemented(format!(

"Log of Decimal256 larger than Decimal128 is not yet supported: {value}"

))),

}

Thoughts on reusing existing function above? Also changing error type as NotYetImplemented seems more appropriate (though maybe should use the DataFusion version instead of Arrow version 🤔 )

Jefffrey · 2025-09-02T08:19:35Z

datafusion/functions/src/math/log.rs

+                &DataType::Float64
+            } else {
+                args[0].data_type()
+            };


Should this logic match the above return_type() function?

theirix added 4 commits August 3, 2025 11:27

Enable env_logger for datafusion-functions crate

a597e3a

Fixup ScalarValue decimal constructors

6a71c64

Support decimals in log UDF

62fcf93

Add sqllogic test for log on Decimals

74e8270

github-actions bot added sqllogictest SQL Logic Tests (.slt) common Related to common crate functions Changes to functions implementation labels Aug 3, 2025

theirix added 7 commits August 3, 2025 13:31

Fix test for scalar new_ten

e693f12

Loosen requirements on a return type

9dec4cf

Remove extra logging

ae26d5c

Improve handling scale and mix of base/value types

e296b8a

Signed-off-by: theirix <theirix@gmail.com>

Format

d6c463f

Merge branch 'main' into log-decimal

b034768

Adjust ScalarFunctionArgs construction

c2559ba

theirix marked this pull request as ready for review August 10, 2025 07:23

Jefffrey reviewed Sep 2, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Support log for Decimal128 and Decimal256 #17023

feat: Support log for Decimal128 and Decimal256 #17023

Uh oh!

theirix commented Aug 3, 2025 •

edited

Loading

Uh oh!

Jefffrey Sep 2, 2025

Uh oh!

Jefffrey Sep 2, 2025

Uh oh!

Jefffrey Sep 2, 2025

Uh oh!

Jefffrey Sep 2, 2025

Uh oh!

Uh oh!

feat: Support log for Decimal128 and Decimal256 #17023

Are you sure you want to change the base?

feat: Support log for Decimal128 and Decimal256 #17023

Uh oh!

Conversation

theirix commented Aug 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

Jefffrey Sep 2, 2025

Choose a reason for hiding this comment

Uh oh!

Jefffrey Sep 2, 2025

Choose a reason for hiding this comment

Uh oh!

Jefffrey Sep 2, 2025

Choose a reason for hiding this comment

Uh oh!

Jefffrey Sep 2, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

theirix commented Aug 3, 2025 •

edited

Loading