Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 9 additions & 9 deletions datafusion/sqllogictest/test_files/join.slt.part
Original file line number Diff line number Diff line change
Expand Up @@ -853,47 +853,47 @@ physical_plan
03)----DataSourceExec: partitions=1, partition_sizes=[1]
04)----DataSourceExec: partitions=1, partition_sizes=[1]

query ITT
query ITT rowsort
SELECT e.emp_id, e.name, d.dept_name
FROM employees AS e
LEFT JOIN department AS d
ON (e.name = 'Alice' OR e.name = 'Bob');
----
1 Alice HR
1 Alice Engineering
1 Alice HR
1 Alice Sales
2 Bob HR
2 Bob Engineering
2 Bob HR
2 Bob Sales
3 Carol NULL

# neither RIGHT OUTER JOIN
query ITT
query ITT rowsort
SELECT e.emp_id, e.name, d.dept_name
FROM department AS d
RIGHT JOIN employees AS e
ON (e.name = 'Alice' OR e.name = 'Bob');
----
1 Alice HR
1 Alice Engineering
1 Alice HR
1 Alice Sales
2 Bob HR
2 Bob Engineering
2 Bob HR
2 Bob Sales
3 Carol NULL

# neither FULL OUTER JOIN
query ITT
query ITT rowsort
SELECT e.emp_id, e.name, d.dept_name
FROM department AS d
FULL JOIN employees AS e
ON (e.name = 'Alice' OR e.name = 'Bob');
----
1 Alice HR
1 Alice Engineering
1 Alice HR
1 Alice Sales
2 Bob HR
2 Bob Engineering
2 Bob HR
2 Bob Sales
3 Carol NULL

Expand Down
40 changes: 30 additions & 10 deletions datafusion/sqllogictest/test_files/joins.slt
Original file line number Diff line number Diff line change
Expand Up @@ -4164,23 +4164,43 @@ AS VALUES
(3, 3, true),
(3, 3, false);

query IIIIB
SELECT * FROM t0 FULL JOIN t1 ON t0.c1 = t1.c1 LIMIT 2;
query IIIIB rowsort
-- Note: using LIMIT value higher than cardinality before LIMIT to avoid query non-determinism
SELECT * FROM t0 FULL JOIN t1 ON t0.c1 = t1.c1 LIMIT 20;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those 3 tests are supposed to test the GlobalLimit operator, instead of TopK, using order by might change the test goal.
To ensure those tests are deterministic, I set the limit to a number larger than the result set.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using order by might change the test goal.

good point

set the limit to a number larger than the result set.

doesn't it also change the test goal though? for example, if the LIMIT n would pass n+3 rows, this test would still pass

I don't think SLT with static expected values is a way to test LIMIT n at all
we need to accept this fact.
i am ok with the change proposed here, ie LIMIT being more than actual row count.

nit: let's maybe add a code comment that LIMIT should be more row count. This is so that when input data is changed and this test becomes flaky again, we still know how to fix it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
SELECT * FROM t0 FULL JOIN t1 ON t0.c1 = t1.c1 LIMIT 20;
-- Note: using LIMIT value higher than cardinality before LIMIT to avoid query non-determinism
SELECT * FROM t0 FULL JOIN t1 ON t0.c1 = t1.c1 LIMIT 20;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review and suggested changes.

That's a great point. Yes, it still changes the test goal, but in my opinion, it's closer to the original test compared to TopK.

I think this case could be covered by a sqllogictest property check runner — one that detects queries with a global limit k and asserts that the final result size is <= k.
Implementing such a sqllogictest property check extension is on my to-do list, and I hope to find time to work on it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does sqllogictest have a reasonable way to add extensions? or do you mean modifying the upstream library?

at dbt we have some extensions to SLT, but they always feel like hacks at the code level (like special comments)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No AFAIK, one of the recent extension is also a bit hacky #16183

----
2 2 2 2 true
1 1 NULL NULL NULL
2 2 2 2 false

query IIIIB
SELECT * FROM t0 FULL JOIN t1 ON t0.c2 >= t1.c2 LIMIT 2;
----
2 2 2 2 true
3 3 2 2 true
3 3 3 3 false
3 3 3 3 true
4 4 NULL NULL NULL

query IIIIB
SELECT * FROM t0 FULL JOIN t1 ON t0.c1 = t1.c1 AND t0.c2 >= t1.c2 LIMIT 2;
query IIIIB rowsort
-- Note: using LIMIT value higher than cardinality before LIMIT to avoid query non-determinism
SELECT * FROM t0 FULL JOIN t1 ON t0.c2 >= t1.c2 LIMIT 20;
----
1 1 NULL NULL NULL
2 2 2 2 false
2 2 2 2 true
3 3 2 2 false
3 3 2 2 true
3 3 3 3 false
3 3 3 3 true
4 4 2 2 false
4 4 2 2 true
4 4 3 3 false
4 4 3 3 true

query IIIIB rowsort
-- Note: using LIMIT value higher than cardinality before LIMIT to avoid query non-determinism
SELECT * FROM t0 FULL JOIN t1 ON t0.c1 = t1.c1 AND t0.c2 >= t1.c2 LIMIT 20;
----
1 1 NULL NULL NULL
2 2 2 2 false
2 2 2 2 true
3 3 3 3 false
3 3 3 3 true
4 4 NULL NULL NULL

## Test !join.on.is_empty() && join.filter.is_none()
query TT
Expand Down