Skip to content

perf: add specializations for mapreduce min/max #48

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

pfackeldey
Copy link

This PR add specializations for mapreduce(maximum, max, V) and mapreduce(minimum, min, V) to directly dispatch to maximum(V.data) and minimum(V.data) and thus avoiding allocating intermediates.

Copy link

codecov bot commented Jul 31, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 96.21%. Comparing base (021083f) to head (2dd33ca).

Additional details and impacted files
@@            Coverage Diff             @@
##             main      #48      +/-   ##
==========================================
+ Coverage   96.20%   96.21%   +0.01%     
==========================================
  Files           8        8              
  Lines         527      529       +2     
==========================================
+ Hits          507      509       +2     
  Misses         20       20              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@oschulz
Copy link
Collaborator

oschulz commented Jul 31, 2025

Thanks @pfackeldey ! Could you add this specialization for ArrayOfSimilarArrays too?

Also,, should we add specializations for Base.mapreduce(::typeof(sum), ::typeof(+), ...)?

@Moelf
Copy link
Contributor

Moelf commented Jul 31, 2025

ArrayOfSimilarArrays

it would be useful to have a table at the top of documentation to show how the types are related to each other.

Is this type disjoint from ArrayOfArrays?

Also,, should we add specializations for Base.mapreduce(::typeof(sum), ::typeof(+), ...)?

yeah seems useful...

@oschulz
Copy link
Collaborator

oschulz commented Jul 31, 2025

it would be useful to have a table at the top of documentation to show how the types are related to each other.

Yes, good idea!

Is this type disjoint from ArrayOfArrays

We don't have a type ArrayOfArrays in the package, but ArrayOfSimilarArrays and VectorOfArrays are disjoint.

pfackeldey and others added 2 commits July 31, 2025 09:36
Comment on lines +265 to +266
Base.mapreduce(::typeof(maximum), ::typeof(max), A::ArrayOfSimilarArrays; kw...) = maximum(flatview(A); kw...)
Base.mapreduce(::typeof(minimum), ::typeof(min), A::ArrayOfSimilarArrays; kw...) = minimum(flatview(A); kw...)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to forward the kwargs to minimum here? The kwargs of mapreduce and minimum are not compatible, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah technically they don't completely overlap, however, the problem is we need init because other wise there's no way to make reducing over empty arrays work

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would you in this case rather curry minimum to a minimum(..., init=0) and put this in mapreduce? and that hopefully still works because the typeof(minimum) would still trigger on the curried function?

I don't know how this would look like in Julia, but in Python it would be something like the following pseudo-code:

minimum_with_init = partial(minimum, init=0)

mapreduce(minimum_with_init, min, A)

I'm not sure how multiple dispatch works with curried functions, and not sure if this is a common solution/paradigm in Julia.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might be best to declare and handle the kwargs explicitly.

As for the minimum with init, I doubt many users would write their code like that ...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might be best to declare and handle the kwargs explicitly.

there's no good default for init=, so if user doesn't provide one, we also don't

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done in ffe6030

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On generic arrays the default implementation of mapreduce is

mapreduce(f, op, A::AbstractArrayOrBroadcasted; dims=:, init=_InitialValue()) =
_mapreduce_dim(f, op, init, A, dims)

but on GPU arrays it is

Base.mapreduce(f, op, A::AnyGPUArray, As::AbstractArrayOrBroadcasted...; dims=:, init=nothing)

So handling of the init default differs, and _InitialValue is not public. it would be nice of there was an official "default init" construct, but there isn't, so we have to get a bit creative.

Also, we can't just pass a dims kwarg that is not : through to the aggregation function on the flattened collection, it would result in incorrect results in many cases.

I suggest we do something like this (untested!):

using Base: add_sum, mul_prod

for (f, op) in [
    (:sum, :add_sum), (:product, :mul_prod), (:maximum, :max), (:minimum, :min)
]
    for AoA in [:AbstractArrayOfSimilarArrays, :AbstractVectorOfArrays]
        @eval begin
            Base.mapreduce(::typeof($f), ::typeof($op), A::$AoA; dims = :, init = nothing)
                = _aoa_associative_mapreduce($f, $op, $AoA, init)
        end
    end
end

_aoa_associative_mapreduce(f, ::Any, A, ::Colon, ::Nothing) = f(flatview(A))
_aoa_associative_mapreduce(f, ::Any, A, ::Colon, init) = f(flatview(A); init = init)

function _aoa_associative_mapreduce(f, op, A, dims, init) =
    # Wrap `f` to dispatch to generic mapreduce implementation for `dims!=:`:
    wrapped_f = x -> f(x)
    return maprecduce_impl(x -> f(x), op, A, dims = dims, init = init)
end

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@oschulz I think we're gonna give up making PR now 😂

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, we have to do it somewhat cleanly, esp. the dims handling.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's no dims to handle if we're just specializing VectorOfX, because vector is 1D

let's do that first I guess

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants