Skip to content

Performance: combinations allocates a new array on every iteration #199

@lntricate1

Description

@lntricate1

Related: #127. Compare the performance:

function test_fast(n::Int, t::Int)
  x = 0
  for I in Combinatorics.Combinations(n, t)
    x += sum(I)
  end
  return x
end

function test_slow(n::Int, t::Int)
  x = 0
  for I in combinations(1:n, t)
    x += sum(I)
  end
  return x
end

@time test_fast(100, 5) # 1.497366 seconds (2 allocations: 96 bytes)
@time test_slow(100, 5) # 9.049514 seconds (150.58 M allocations: 6.731 GiB, 22.53% gc time)

As it is right now, this is not possible to fix. Reusing the same vector on every iteration will break collect and this is by design: see discussion at JuliaLang/julia#59314. Therefore, removing the allocations would require switching to Tuples or an immutable struct instead.

As mentioned in #127, using something immutable would also allocate if the length of the elements is not known at compile time. However, seeing as the alternative is to allocate a new vector on every iteration anyway, I think the switch is worth it. It would also bring Combinations more in line with CartesianIndices in both implementation and usage.

Another consideration is that the single-argument version of combinations returns multiple different lengths. To avoid allocations there, the objects returned to the user would need to always be the same type but represent different length combinations. The best I can come up with at the moment is a wrapper around a Tuple that stores the "length" separately and implements all the AbstractArray functions, but I'm sure there's better alternatives to that. It's also worth discussing whether the two-argument combinations should return that same type or just use a Tuple.

Finally, there's many other iterators in Combinatorics that are subject to the same allocation issue: MultiExponents, CoolLexCombinations, multiset_combinations, permutations, etc. If it's decided that Combinations should switch to use something immutable, the other iterators should do the same.

If a decision is reached on whether to switch from vectors to something immutable, I'd be happy to write the implementations.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions