Skip to content

Conversation

@andygrove
Copy link
Member

@andygrove andygrove commented Oct 24, 2025

Which issue does this PR close?

N/A

Rationale for this change

The main motivation is to change the default so that the explain plan is verbose by default rather than a list of fallback reasons.

The config is change from a boolean to a string representing the format so that other formats can be added in the future.

Format 'fallback':

Comet is not compatible with Spark for case conversion in locale-specific cases. Set spark.comet.caseConversion.enabled=true to enable it anyway.

Format 'verbose':

Project [COMET: Comet is not compatible with Spark for case conversion in locale-specific cases. Set spark.comet.caseConversion.enabled=true to enable it anyway.]
+- CometColumnarToRow
   +- CometScan [native_iceberg_compat] parquet spark_catalog.default.names

Comet accelerated 1 out of 2 eligible operators (50%). Final plan contains 1 transitions between Spark and Comet.

What changes are included in this PR?

  • Change config from boolean to string
  • Show verbose extended explain info by default

How are these changes tested?

@codecov-commenter
Copy link

codecov-commenter commented Oct 24, 2025

Codecov Report

❌ Patch coverage is 82.35294% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 59.20%. Comparing base (f09f8af) to head (f3de01f).
⚠️ Report is 644 commits behind head on main.

Files with missing lines Patch % Lines
...n/scala/org/apache/comet/ExtendedExplainInfo.scala 66.66% 2 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #2644      +/-   ##
============================================
+ Coverage     56.12%   59.20%   +3.08%     
- Complexity      976     1448     +472     
============================================
  Files           119      147      +28     
  Lines         11743    13755    +2012     
  Branches       2251     2367     +116     
============================================
+ Hits           6591     8144    +1553     
- Misses         4012     4386     +374     
- Partials       1140     1225      +85     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@parthchandra
Copy link
Contributor

This was done so that we could have different outputs for different use cases. The less verbose use case was intended to allow an analytic system to query fallback reasons, so the output was simple and easily parsable in Spark SQL. The more verbose use case was intended for human consumption via the Spark UI. (Comet extended info can be displayed in the Spark UI's SQL tab, since Spark 4.0)

@andygrove
Copy link
Member Author

This was done so that we could have different outputs for different use cases. The less verbose use case was intended to allow an analytic system to query fallback reasons, so the output was simple and easily parsable in Spark SQL. The more verbose use case was intended for human consumption via the Spark UI. (Comet extended info can be displayed in the Spark UI's SQL tab, since Spark 4.0)

@parthchandra Are you ok with the change to remove the option, or do you think we need to keep this functionality?

@parthchandra
Copy link
Contributor

This was done so that we could have different outputs for different use cases. The less verbose use case was intended to allow an analytic system to query fallback reasons, so the output was simple and easily parsable in Spark SQL. The more verbose use case was intended for human consumption via the Spark UI. (Comet extended info can be displayed in the Spark UI's SQL tab, since Spark 4.0)

@parthchandra Are you ok with the change to remove the option, or do you think we need to keep this functionality?

I feel we must have the verbose output available at all times so that users can get that in the Spark UI and explain plan. Whether it is behind a config option or not is up to us.
The option to get an output that is easier to parse in an automated system is up to us to decide collectively. Any analysis of the verbose output will involve custom parsing (regexps in the simplest cases) that will be broken every time someone adds a new piece of information. I would prefer we keep this option too, but you can change my mind :)
Perhaps we can add a unit test to verify that the output can be parsed can catch breaking changes to the output.

@andygrove andygrove changed the title chore: Remove COMET_EXPLAIN_VERBOSE_ENABLED config chore: Rename COMET_EXPLAIN_VERBOSE_ENABLED to COMET_EXTENDED_EXPLAIN_FORMAT and change default Oct 27, 2025
@andygrove
Copy link
Member Author

This was done so that we could have different outputs for different use cases. The less verbose use case was intended to allow an analytic system to query fallback reasons, so the output was simple and easily parsable in Spark SQL. The more verbose use case was intended for human consumption via the Spark UI. (Comet extended info can be displayed in the Spark UI's SQL tab, since Spark 4.0)

@parthchandra Are you ok with the change to remove the option, or do you think we need to keep this functionality?

I feel we must have the verbose output available at all times so that users can get that in the Spark UI and explain plan. Whether it is behind a config option or not is up to us. The option to get an output that is easier to parse in an automated system is up to us to decide collectively. Any analysis of the verbose output will involve custom parsing (regexps in the simplest cases) that will be broken every time someone adds a new piece of information. I would prefer we keep this option too, but you can change my mind :) Perhaps we can add a unit test to verify that the output can be parsed can catch breaking changes to the output.

ok, I largely reverted the changes and changes the config from an enabled boolean to a string that allows a specific format to be chosen, defaulting to verbose. This is more flexible for adding more formats in the future.

andygrove and others added 4 commits October 28, 2025 13:57
Co-authored-by: Martin Grigorov <martin-g@users.noreply.github.com>
Co-authored-by: Martin Grigorov <martin-g@users.noreply.github.com>
Co-authored-by: Martin Grigorov <martin-g@users.noreply.github.com>
Co-authored-by: Martin Grigorov <martin-g@users.noreply.github.com>
@andygrove
Copy link
Member Author

Thanks for the suggestions @martin-g!

@andygrove andygrove marked this pull request as draft October 28, 2025 20:37
@andygrove andygrove marked this pull request as ready for review October 28, 2025 21:12
@andygrove andygrove marked this pull request as draft October 30, 2025 17:04
@andygrove
Copy link
Member Author

Moving to draft until #2669 is merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants