Skip to content

Conversation

@nicktobey
Copy link
Contributor

@nicktobey nicktobey commented Dec 5, 2025

Fixes dolthub/dolt#9520

In Functional Dependency Analysis, equivalence sets are sets of columns which have been determined to always be equal to each other. During join planning, we walk the join tree, read the join filters, and use these filters to compute equivalence sets which can inform the analysis.

However, we currently only look at filters on inner joins, because filters on outer joins do not unconditionally imply equivalence.

For example, in the following join query:

SELECT * FROM table_one LEFT JOIN table_two ON table_one.one = table_two.two

It cannot be said that table_one.one and table_two.two have equal values in the output. Any of the following are valid rows in the final output:

table_one.one table_two.two
1 1
1 NULL
NULL NULL

In order to record this filter and include it in FDS, we need to tweak the definition of equivalence sets slightly.

This PR adds conditional equivalence sets, which consist of two column sets: conditional columns and equivalent columns. A conditional equivalence set should be interpreted as: "IF at least one of the columns in conditional is not null, THEN all of the columns in equivalent are equal." This matches the behavior of left joins.

We could implement regular equivalence sets as conditional equivalence sets with an empty conditional, but this PR keeps them separate to avoid complicating existing logic.

It's worth noting that we deliberately don't check if the columns are non-null at the time that the equivalence set is created. This is deliberate, because when equivalence sets are inherited by parent nodes, this can change for outer joins, and when evaluating whether a join can be implemented as a lookup, we analyze the child node using filters and equivalence sets from the parent, but with the child's nullness information.

Thanks to Angela, who worked on the investigation with me, wrote the original version of this feature (#3288), and wrote the plan test for this PR.

Copy link
Member

@zachmu zachmu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this makes sense, although it's probably worth doing another pass to see where tests could be shored up a bit. The changes to plans I saw make sense, but it would be nice to see some additional tests of left joins that target this change.

@nicktobey
Copy link
Contributor Author

As discussed offline, this PR already contains additional tests, and the changes to existing tests look correct.

@nicktobey nicktobey merged commit ca1655f into main Dec 9, 2025
9 checks passed
@nicktobey nicktobey deleted the nicktobey/outerEquivs branch December 9, 2025 02:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

PRIMARY KEY isn't always used in left joins

4 participants