Skip to content

Add Spark accelerators #517

@Iskander14yo

Description

@Iskander14yo

Motivation

There are several Spark plugins and extensions that claim to offer significantly better performance than vanilla Spark for analytical workloads -for example, Apache Comet (based on DataFusion), Blaze (DataFusion), and Apache Gluten (mainly Velox, ClickHouse). Since ClickBench already allows GPU-based engines (#410), NVIDIA RAPIDS could also be considered.

I believe these are worth evaluating for a few reasons:

  • Independent benchmark: These accelerators are typically benchmarked only on their own TPC-H/TPC-DS variants. Including them in ClickBench would offer a more independent comparison. This is valuable to both users and developers.
  • Distributed vs. single-node overhead: Some of these tools have single-node counterparts (e.g., DataFusion). Measuring how much overhead comes from different factors (such as Spark’s architecture, quality of implementation) is useful in itself.
  • Spark user insights: For Spark users, this shows how much they can gain (or lose) by adopting such plugins (which are mostly easy to install having already deployed Spark) to compare with other distributed engines and Spark itself.

Request

Would you be open to including these in ClickBench? If I checked correctly, no similar issue has been created yet.
If yes, I’d be glad to take on this task and submit the necessary PRs incrementally (likely one per engine).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions