Introduce notion of conditional equivalence sets in FDS for optimizing outer joins. #3331
+574
−482
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes dolthub/dolt#9520
In Functional Dependency Analysis, equivalence sets are sets of columns which have been determined to always be equal to each other. During join planning, we walk the join tree, read the join filters, and use these filters to compute equivalence sets which can inform the analysis.
However, we currently only look at filters on inner joins, because filters on outer joins do not unconditionally imply equivalence.
For example, in the following join query:
It cannot be said that
table_one.oneandtable_two.twohave equal values in the output. Any of the following are valid rows in the final output:In order to record this filter and include it in FDS, we need to tweak the definition of equivalence sets slightly.
This PR adds conditional equivalence sets, which consist of two column sets: conditional columns and equivalent columns. A conditional equivalence set should be interpreted as: "IF at least one of the columns in conditional is not null, THEN all of the columns in equivalent are equal." This matches the behavior of left joins.
We could implement regular equivalence sets as conditional equivalence sets with an empty conditional, but this PR keeps them separate to avoid complicating existing logic.
It's worth noting that we deliberately don't check if the columns are non-null at the time that the equivalence set is created. This is deliberate, because when equivalence sets are inherited by parent nodes, this can change for outer joins, and when evaluating whether a join can be implemented as a lookup, we analyze the child node using filters and equivalence sets from the parent, but with the child's nullness information.
Thanks to Angela, who worked on the investigation with me, wrote the original version of this feature (#3288), and wrote the plan test for this PR.