-
Notifications
You must be signed in to change notification settings - Fork 229
Open
Description
Motivation
There are several Spark plugins and extensions that claim to offer significantly better performance than vanilla Spark for analytical workloads -for example, Apache Comet (based on DataFusion), Blaze (DataFusion), and Apache Gluten (mainly Velox, ClickHouse). Since ClickBench already allows GPU-based engines (#410), NVIDIA RAPIDS could also be considered.
I believe these are worth evaluating for a few reasons:
- Independent benchmark: These accelerators are typically benchmarked only on their own TPC-H/TPC-DS variants. Including them in ClickBench would offer a more independent comparison. This is valuable to both users and developers.
- Distributed vs. single-node overhead: Some of these tools have single-node counterparts (e.g., DataFusion). Measuring how much overhead comes from different factors (such as Spark’s architecture, quality of implementation) is useful in itself.
- Spark user insights: For Spark users, this shows how much they can gain (or lose) by adopting such plugins (which are mostly easy to install having already deployed Spark) to compare with other distributed engines and Spark itself.
Request
Would you be open to including these in ClickBench? If I checked correctly, no similar issue has been created yet.
If yes, I’d be glad to take on this task and submit the necessary PRs incrementally (likely one per engine).
alexey-milovidov, SparkApplicationMaster and Marmeladenbrot
Metadata
Metadata
Assignees
Labels
No labels