Skip to content

[parquet] reduce the time spent in CachedArrayReader #9060

@alamb

Description

@alamb

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

I am profiling clickbench query 26 with predicate pushdown enabled as part of

samply record -- /Users/andrewlamb/Software/datafusion2/target/profiling/datafusion-cli   -f q.sql  > /dev/null  2>&1
SELECT "SearchPhrase" FROM hits WHERE "SearchPhrase" <> '' ORDER BY "EventTime", "SearchPhrase" LIMIT 10;

While looking at the profile, I noticed that 3% of the time is spent concatenating in the cached array reader

Image

I believe the call is here:

_ => Ok(arrow_select::concat::concat(

Describe the solution you'd like
I would like to make this faster

Describe alternatives you've considered
I think we can use the BatchCoalescer for this task and potentially save at least one copy

Additional context

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementAny new improvement worthy of a entry in the changelog

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions