-
Notifications
You must be signed in to change notification settings - Fork 32
Grammar for plan constraints #20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,77 @@ | ||
# **RFC007 for Presto** | ||
|
||
## Grammar for Plan Constraints | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would call this feature Plan Hints rather than Plan Constraints. The optimizer may not follow the hints (though we should issue a warning if we don't) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Sure, will rename |
||
|
||
Proposers | ||
|
||
* @aaneja | ||
* @ClarenceThreepwood | ||
|
||
## Related Issues | ||
|
||
* [Overview doc](https://prestodb.io/wp-content/uploads/Search-Space-Improvements-Plan-Constraints.pdf) on Plan Constraints as a tool to control search space | ||
* PrestoDB blog on - [Elevating Presto Query Optimization](https://prestodb.io/blog/2024/03/21/elevating-presto-query-optimization/) | ||
|
||
## Summary | ||
|
||
This document proposes a grammar for specifying plan constraints | ||
|
||
## Background | ||
|
||
Plan constraints can be used to lock down critical aspects of an execution plan, such as access method, join method, and join order. | ||
|
||
|
||
## Proposed Implementation | ||
|
||
We propose a grammar for specifying independent plan constraints, which take the form of a SQL comment block that would build an object graph of the constraints. Multiple constraints can be specified in a single place. The grammar is open for extension as we develop more mechanisms to lock down plans. | ||
|
||
In this first cut, users can build constraints to control | ||
- Join orders and distributions for INNER JOIN's | ||
- Cardinality (row counts) for base relations and join sub-plans | ||
|
||
### Grammar | ||
|
||
``` | ||
planConstraintString : /*! planConstraint [, ...] */ | ||
|
||
planConstraint : joinConstraint | ||
| cardinalityConstraint | ||
|
||
joinConstraint : joinType (joinNode) [distributionType] | ||
aaneja marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
||
cardinalityConstraint : CARD (joinNode cardinality) | ||
|
||
distributionType : [P] | ||
| [R] | ||
|
||
joinType : JOIN (defaults to inner join) | ||
| IJ | ||
| LOJ | ||
| ROJ | ||
aaneja marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
||
cardinality : integer constant (positive) | ||
|
||
joinNode : (relationName relationName [, ...]) | ||
|
||
| joinConstraint | ||
|
||
| (relationName joinNode) | ||
|
||
| (joinNode relationName) | ||
|
||
| (joinNode joinNode [, ...]) | ||
``` | ||
|
||
|
||
### Examples of constraints | ||
aaneja marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
1. Inner Join constraints - | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why can't we just use replicated or paritioned syntax? Imo the brackets and There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you elaborate on what this would look like for the cited example ? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Like dremio or databricks. Current way also is fine but suggestion is to use explicit names for broadcast/partition etc since the goal of this is to allow users to explictly set the types There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not opposed to this; my only gripe with it is that it that the plan constraint string can get quite verbose, e.g for a 4-table join order - There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we need this fine grained kind of control. I like the simplicity of just specifying the partitioning of the table, and every other example I found seems to take that approach.
You can specify to use syntactic join order if you want to control the join ordering in a more complex way. I find the existing syntax complex and hard to use. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. DB2 supports 'join requests' similar to these join hints. It has variants that specify the join method too
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Got it. I see the appeal of specifying a partial join order for auto generated queries, but also would like it to be easy to specify join hints for the common case where people just want to mark some table for broadcast or similar. I wonder if there's an alternative approach we can use to achieve both of these goals There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We could add Spark style independent hints -
These will be complementary to the join-order syntax. So the below join hints are equivalent but provide flexibility to the user -
Will think of more examples and incorporate them |
||
1. `join (a (b c))` - Join the relations a,b,c as a right deep tree (denoted by the brackets). Use regular rules for determining join distribution | ||
2. `join (((a c) [R] b) [P])` - In addition to the join order, use a REPLICATED `[R]` join for sub-plan `(a c)` and PARTITIONED `[P]` for `(a c) b` | ||
3. If an inner join condition does not exist between nodes a CrossJoin is automatically inferred | ||
2. Cardinality constraints - | ||
aaneja marked this conversation as resolved.
Show resolved
Hide resolved
|
||
1. `card (c 10)` - Set the output row count estimate of `c` to `10` | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would rather not expose this kind of control. Better to specify what you want to have happen with this table, and otherwise leave it be. Seems pretty risky to encode cardinality estimates in the query text. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you elaborate on the risk ? I see this
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The risk I see is that if you are specifying a cardinality of x, you are probably doing it because you want the optimizer to do some particular thing about it. But you don't really know what the optimizer will do with that information, and it could do one thing for a while, and then in a new release there's an optimizer change and it does something else. Because you aren't directly controlling what happens when you specify the cardinality, it's hard to say how it might affect the query, and could be hard to debug if the performance degrades (vs. if you e.g. specify broadcast join and your data gets bigger, it's very clear what happens) It can also get out of sync with the data (there is always a risk with hand tuning that the optimization will no longer be relevan or will perform worse as the data changes, but specifying a specific cardinality estimate can have more varied and unknown effects). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree that this is a powerful knob to give to the users, one that we can document as such (with caveats) To aid debugging, we can add Additional safeguards like warnings & metrics can be incorporated too if the stats estimate differ widely from actual runtime observed cardinality |
||
2. `card ((c o) 10)` - When considering a join node of shape `(c o)` set the output row count estimate to `10` | ||
|
||
### Other points of note | ||
- Relation names loosely resolve to WITH query aliases (CTE definitions) and table names.A detailed description of the name resolution is out of scope of this RFC (this will be covered in the implementation PR description) | ||
aaneja marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
Uh oh!
There was an error while loading. Please reload this page.