From cac050ae1694c8b57e210754787e8469f5773f77 Mon Sep 17 00:00:00 2001 From: Anant Aneja <1797669+aaneja@users.noreply.github.com> Date: Fri, 26 Jul 2024 20:23:54 +0530 Subject: [PATCH 1/4] First draft of plan constraints grammar --- RFC-0007-plan-constraints-grammar.md | 77 ++++++++++++++++++++++++++++ 1 file changed, 77 insertions(+) create mode 100644 RFC-0007-plan-constraints-grammar.md diff --git a/RFC-0007-plan-constraints-grammar.md b/RFC-0007-plan-constraints-grammar.md new file mode 100644 index 00000000..15432281 --- /dev/null +++ b/RFC-0007-plan-constraints-grammar.md @@ -0,0 +1,77 @@ +# **RFC007 for Presto** + +## Grammar for Plan Constraints + +Proposers + +* @aaneja +* @ClarenceThreepwood + +## Related Issues + +* [Overview doc](https://prestodb.io/wp-content/uploads/Search-Space-Improvements-Plan-Constraints.pdf) on Plan Constraints as a tool to control search space +* PrestoDB blog on - [Elevating Presto Query Optimization](https://prestodb.io/blog/2024/03/21/elevating-presto-query-optimization/) + +## Summary + +This document proposes a grammar for specifying plan constraints + +## Background + +Plan constraints can be used to lock down critical aspects of an execution plan, such as access method, join method, and join order. + + +## Proposed Implementation + +We propose a grammar for specifying independent plan constraints, which take the form of a SQL comment block that would build an object graph of the constraints. Multiple constraints can be specified in a single place. The grammar is open for extension as we develop more mechanisms to lock down plans. + +In this first cut, users can build constraints to control +- Join orders and distributions for INNER JOIN's +- Cardinality (row counts) for base relations and join sub-plans + +### Grammar + +``` +planConstraintString : /*! planConstraint [, ...] */ + +planConstraint : joinConstraint +| cardinalityConstraint + +joinConstraint : joinType (joinNode) [distributionType] + +cardinalityConstraint : CARD (joinNode cardinality) + +distributionType : [P] + | [R] + +joinType : JOIN (defaults to inner join) +| IJ +| LOJ +| ROJ + +cardinality : integer constant (positive) + +joinNode : (relationName relationName [, ...]) + +| joinConstraint + +| (relationName joinNode) + +| (joinNode relationName) + +| (joinNode joinNode [, ...]) +``` + + +### Examples of constraints + +1. Inner Join constraints - + 1. `join (a (b c))` - Join the relations a,b,c as a right deep tree (denoted by the brackets). Use regular rules for determining join distribution + 2. `join (((a c) [R] b) [P])` - In addition to the join order, use a REPLICATED `[R]` join for sub-plan `(a c)` and PARTITIONED `[P]` for `(a c) b` + 3. If an inner join condition does not exist between nodes a CrossJoin is automatically inferred +2. Cardinality constraints - + 1. `card (c 10)` - Set the output row count estimate of `c` to `10` + 2. `card ((c o) 10)` - When considering a join node of shape `(c o)` set the output row count estimate to `10` + +### Other points of note +- Relation names loosely resolve to WITH query aliases (CTE definitions) and table names.A detailed description of the name resolution is out of scope of this RFC (this will be covered in the implementation PR description) From 5ea27d8eb1a1591f78d8f4bc51b3ac63a6063805 Mon Sep 17 00:00:00 2001 From: Anant Aneja <1797669+aaneja@users.noreply.github.com> Date: Wed, 31 Jul 2024 16:17:41 +0530 Subject: [PATCH 2/4] - Modify grammar to be more precise - Add more examples --- RFC-0007-plan-constraints-grammar.md | 106 +++++++++++++++++++-------- 1 file changed, 76 insertions(+), 30 deletions(-) diff --git a/RFC-0007-plan-constraints-grammar.md b/RFC-0007-plan-constraints-grammar.md index 15432281..469b62c5 100644 --- a/RFC-0007-plan-constraints-grammar.md +++ b/RFC-0007-plan-constraints-grammar.md @@ -29,37 +29,52 @@ In this first cut, users can build constraints to control - Join orders and distributions for INNER JOIN's - Cardinality (row counts) for base relations and join sub-plans -### Grammar +### Grammar (ANTLR 4) ``` -planConstraintString : /*! planConstraint [, ...] */ +grammar planConstraints; + +// Lexer rules +NUMBER : [0-9]+ ; // Matches one or more digits +LOJ : 'LOJ' ; +ROJ : 'ROJ' ; +IJ : 'IJ' ; +LPAREN : '(' ; +RPAREN : ')' ; +WS : [ \t\r\n]+ -> skip ; // Skip whitespace +IDENTIFIER : ~[()]+ ; // Matches one or more characters that are not '(' or ')' +JOIN_DIST_PARTITIONED : '[P]' ; +JOIN_DIST_REPLICATED : '[R]' ; +CARD : 'CARD' ; +JOIN : 'JOIN' ; +CONSTRAINTS_START_MARKER :'/*!'; +CONSTRAINTS_END_MARKER :'*/'; + +relationName + : relationName (joinType=LOJ | joinType=ROJ | joinType=IJ)? relationName (attribute=JOIN_DIST_PARTITIONED | attribute=JOIN_DIST_REPLICATED)? # withJoinType + | LPAREN relationName RPAREN # Grouping + | IDENTIFIER # standaloneRelation + ; + +joinType + : LOJ + | ROJ + | IJ + ; + +cardinalityConstraint : CARD LPAREN relationName NUMBER RPAREN; + +joinConstraint : JOIN LPAREN relationName RPAREN; + +planConstraint + :joinConstraint + | cardinalityConstraint; + +planConstraintString : CONSTRAINTS_START_MARKER planConstraint (WS planConstraint)* CONSTRAINTS_END_MARKER; + +// Start rule +start: planConstraintString EOF; -planConstraint : joinConstraint -| cardinalityConstraint - -joinConstraint : joinType (joinNode) [distributionType] - -cardinalityConstraint : CARD (joinNode cardinality) - -distributionType : [P] - | [R] - -joinType : JOIN (defaults to inner join) -| IJ -| LOJ -| ROJ - -cardinality : integer constant (positive) - -joinNode : (relationName relationName [, ...]) - -| joinConstraint - -| (relationName joinNode) - -| (joinNode relationName) - -| (joinNode joinNode [, ...]) ``` @@ -71,7 +86,38 @@ joinNode : (relationName relationName [, ...]) 3. If an inner join condition does not exist between nodes a CrossJoin is automatically inferred 2. Cardinality constraints - 1. `card (c 10)` - Set the output row count estimate of `c` to `10` - 2. `card ((c o) 10)` - When considering a join node of shape `(c o)` set the output row count estimate to `10` + 2. `card ((c o) 10)` - When/If considering a join node of shape `(c o)` set the output row count estimate to `10` +3. Both type of constraints - + `join (c o) card ((c o) 10)` - Force a join-sub-graph of nodes `c InnerJoin o`. For this join node, set the output row count estimate to `10` + +#### Full SQL examples of queries with constraints + +Force the join of `c` and `cte` which is otherwise ignored, see [19354](https://github.com/prestodb/presto/issues/19354) +``` +/*! join ((c cte) n) */ +-- +with cte as ( + select min(orderkey) as min + from orders +) +select count(*) +from customer c, + nation n, + cte +where c.custkey = cte.min + and n.nationkey = c.nationkey +``` + +Force the inner join of `l` and `o`, which is otherwise ignored, see [19894](https://github.com/prestodb/presto/issues/19894) +``` +/*! join (s (l o)) */ +select 1 +from supplier s, + lineitem l, + orders o +where l.orderkey = o.orderkey +``` + ### Other points of note -- Relation names loosely resolve to WITH query aliases (CTE definitions) and table names.A detailed description of the name resolution is out of scope of this RFC (this will be covered in the implementation PR description) +- Relation names loosely resolve to WITH query aliases (CTE definitions), table names and aliases. A detailed description of the name resolution is out of scope of this RFC (this will be covered in the implementation PR description) From 3fcc6118641b15477ae82bc2b069b926f86cfe23 Mon Sep 17 00:00:00 2001 From: Anant Aneja <1797669+aaneja@users.noreply.github.com> Date: Tue, 20 Aug 2024 14:01:19 +0530 Subject: [PATCH 3/4] Fix to latest grammar --- RFC-0007-plan-constraints-grammar.md | 75 +++++++++++++++++++++------- 1 file changed, 56 insertions(+), 19 deletions(-) diff --git a/RFC-0007-plan-constraints-grammar.md b/RFC-0007-plan-constraints-grammar.md index 469b62c5..7b68fb9f 100644 --- a/RFC-0007-plan-constraints-grammar.md +++ b/RFC-0007-plan-constraints-grammar.md @@ -32,17 +32,16 @@ In this first cut, users can build constraints to control ### Grammar (ANTLR 4) ``` -grammar planConstraints; +grammar PlanConstraints; // Lexer rules -NUMBER : [0-9]+ ; // Matches one or more digits +WS : [ \t\r\n]+ -> skip ; +NUMBER : [0-9]+ ; LOJ : 'LOJ' ; ROJ : 'ROJ' ; IJ : 'IJ' ; LPAREN : '(' ; RPAREN : ')' ; -WS : [ \t\r\n]+ -> skip ; // Skip whitespace -IDENTIFIER : ~[()]+ ; // Matches one or more characters that are not '(' or ')' JOIN_DIST_PARTITIONED : '[P]' ; JOIN_DIST_REPLICATED : '[R]' ; CARD : 'CARD' ; @@ -50,10 +49,34 @@ JOIN : 'JOIN' ; CONSTRAINTS_START_MARKER :'/*!'; CONSTRAINTS_END_MARKER :'*/'; -relationName - : relationName (joinType=LOJ | joinType=ROJ | joinType=IJ)? relationName (attribute=JOIN_DIST_PARTITIONED | attribute=JOIN_DIST_REPLICATED)? # withJoinType - | LPAREN relationName RPAREN # Grouping - | IDENTIFIER # standaloneRelation +fragment DIGIT + : [0-9] + ; + +fragment LETTER + : [A-Z] + | [a-z] + ; + +IDENTIFIER + : (LETTER | '_') (LETTER | DIGIT | '_' | '@' | ':')* + ; + +identifier + : IDENTIFIER + ; + +standAloneRelation + : identifier + ; + +groupedRelation + : LPAREN joinedRelation RPAREN + ; + +joinTypeOrDefault + : joinType + | { "IJ" } // Default value ; joinType @@ -62,18 +85,32 @@ joinType | IJ ; -cardinalityConstraint : CARD LPAREN relationName NUMBER RPAREN; +joinAttribute + : JOIN_DIST_PARTITIONED + | JOIN_DIST_REPLICATED + ; + +joinedRelation + : standAloneRelation joinTypeOrDefault standAloneRelation (joinAttribute)? #ss + | standAloneRelation joinTypeOrDefault groupedRelation (joinAttribute)? #sg + | groupedRelation joinTypeOrDefault standAloneRelation (joinAttribute)? #gs + | groupedRelation joinTypeOrDefault groupedRelation (joinAttribute)? #gg + ; + + +cardinalityConstraint + : CARD LPAREN joinedRelation NUMBER RPAREN + | CARD LPAREN standAloneRelation NUMBER RPAREN + ; -joinConstraint : JOIN LPAREN relationName RPAREN; +joinConstraint : JOIN LPAREN joinedRelation RPAREN; -planConstraint - :joinConstraint +planConstraint + : joinConstraint | cardinalityConstraint; -planConstraintString : CONSTRAINTS_START_MARKER planConstraint (WS planConstraint)* CONSTRAINTS_END_MARKER; +planConstraintString : CONSTRAINTS_START_MARKER (planConstraint)* CONSTRAINTS_END_MARKER; -// Start rule -start: planConstraintString EOF; ``` @@ -82,13 +119,13 @@ start: planConstraintString EOF; 1. Inner Join constraints - 1. `join (a (b c))` - Join the relations a,b,c as a right deep tree (denoted by the brackets). Use regular rules for determining join distribution - 2. `join (((a c) [R] b) [P])` - In addition to the join order, use a REPLICATED `[R]` join for sub-plan `(a c)` and PARTITIONED `[P]` for `(a c) b` + 2. `join ((a c [R]) b [P])` - In addition to the join order, use a REPLICATED `[R]` join for sub-plan `(a c)` and PARTITIONED `[P]` for `(a c) b` 3. If an inner join condition does not exist between nodes a CrossJoin is automatically inferred 2. Cardinality constraints - 1. `card (c 10)` - Set the output row count estimate of `c` to `10` - 2. `card ((c o) 10)` - When/If considering a join node of shape `(c o)` set the output row count estimate to `10` + 2. `card (c o 10)` - When/If considering a join node of shape `(c o)` set the output row count estimate to `10` 3. Both type of constraints - - `join (c o) card ((c o) 10)` - Force a join-sub-graph of nodes `c InnerJoin o`. For this join node, set the output row count estimate to `10` + `join (c o) card (c o 10)` - Force a join-sub-graph of nodes `c InnerJoin o`. For this join node, set the output row count estimate to `10` #### Full SQL examples of queries with constraints @@ -120,4 +157,4 @@ where l.orderkey = o.orderkey ### Other points of note -- Relation names loosely resolve to WITH query aliases (CTE definitions), table names and aliases. A detailed description of the name resolution is out of scope of this RFC (this will be covered in the implementation PR description) +- Relation names loosely resolve to WITH query aliases (CTE definitions), table names and aliases. A detailed description of the name resolution is out of scope of this RFC (this will be covered in the description of the implementation PR) From ff71b40f6e1634b73c7552604c9ae40272a7f9b1 Mon Sep 17 00:00:00 2001 From: Anant Aneja <1797669+aaneja@users.noreply.github.com> Date: Mon, 26 Aug 2024 11:32:52 +0530 Subject: [PATCH 4/4] Add details about how other DB vendors support plan constraints --- RFC-0007-plan-constraints-grammar.md | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/RFC-0007-plan-constraints-grammar.md b/RFC-0007-plan-constraints-grammar.md index 7b68fb9f..cb012fe8 100644 --- a/RFC-0007-plan-constraints-grammar.md +++ b/RFC-0007-plan-constraints-grammar.md @@ -18,7 +18,15 @@ This document proposes a grammar for specifying plan constraints ## Background -Plan constraints can be used to lock down critical aspects of an execution plan, such as access method, join method, and join order. +Plan constraints can be used to lock down critical aspects of an execution plan, such as access method, join method, and join order. + +Most commercially available cost-based optimizers support some form by which a user could express plan constraints that enable these two use-cases. Common terms used by DB vendors to describe this set of features include "plan constraints", "optimization guidelines" or "plan hints". + +There are two broad approaches as to how these plan constraints are stored and represented. Vendors such as [Oracle](https://docs.oracle.com/cd/B10500_01/server.920/a96533/hintsref.htm) & [Vertica](https://www.vertica.com/docs/10.0.x/HTML/Content/Authoring/SQLReferenceManual/LanguageElements/Hints/Hints.htm) provide a grammar/syntax where the plan constraint may be expressed inline with the sql query. The parser then picks up the constraint, validates it, and then passes it on to the query optimizer. + +Other systems such as [DB2](https://www.ibm.com/docs/en/db2/11.1?topic=guidelines-creating-statement-level-optimization) and [SqlServer](https://learn.microsoft.com/en-us/sql/t-sql/queries/hints-transact-sql-query?view=sql-server-ver16), provide mechanisms to specify and store these optimization guidelines independent of a sql query, allow the database to map these stored plan constraints to an incoming query and then pass these constraints on to the optimizer as applicable. + +Let us call these two approaches – inline plan constraints vs independent plan constraints. ## Proposed Implementation