-
Notifications
You must be signed in to change notification settings - Fork 1.6k
test: Fix flaky join tests #16860
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test: Fix flaky join tests #16860
Conversation
query IIIIB | ||
SELECT * FROM t0 FULL JOIN t1 ON t0.c1 = t1.c1 LIMIT 2; | ||
query IIIIB rowsort | ||
SELECT * FROM t0 FULL JOIN t1 ON t0.c1 = t1.c1 LIMIT 20; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those 3 tests are supposed to test the GlobalLimit
operator, instead of TopK, using order by
might change the test goal.
To ensure those tests are deterministic, I set the limit to a number larger than the result set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using
order by
might change the test goal.
good point
set the limit to a number larger than the result set.
doesn't it also change the test goal though? for example, if the LIMIT n
would pass n+3
rows, this test would still pass
I don't think SLT with static expected values is a way to test LIMIT n
at all
we need to accept this fact.
i am ok with the change proposed here, ie LIMIT being more than actual row count.
nit: let's maybe add a code comment that LIMIT should be more row count. This is so that when input data is changed and this test becomes flaky again, we still know how to fix it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SELECT * FROM t0 FULL JOIN t1 ON t0.c1 = t1.c1 LIMIT 20; | |
-- Note: using LIMIT value higher than cardinality before LIMIT to avoid query non-determinism | |
SELECT * FROM t0 FULL JOIN t1 ON t0.c1 = t1.c1 LIMIT 20; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review and suggested changes.
That's a great point. Yes, it still changes the test goal, but in my opinion, it's closer to the original test compared to TopK.
I think this case could be covered by a sqllogictest property check runner — one that detects queries with a global limit k and asserts that the final result size is <= k.
Implementing such a sqllogictest property check extension is on my to-do list, and I hope to find time to work on it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does sqllogictest have a reasonable way to add extensions? or do you mean modifying the upstream library?
at dbt we have some extensions to SLT, but they always feel like hacks at the code level (like special comments)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No AFAIK, one of the recent extension is also a bit hacky #16183
query IIIIB | ||
SELECT * FROM t0 FULL JOIN t1 ON t0.c1 = t1.c1 LIMIT 2; | ||
query IIIIB rowsort | ||
SELECT * FROM t0 FULL JOIN t1 ON t0.c1 = t1.c1 LIMIT 20; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using
order by
might change the test goal.
good point
set the limit to a number larger than the result set.
doesn't it also change the test goal though? for example, if the LIMIT n
would pass n+3
rows, this test would still pass
I don't think SLT with static expected values is a way to test LIMIT n
at all
we need to accept this fact.
i am ok with the change proposed here, ie LIMIT being more than actual row count.
nit: let's maybe add a code comment that LIMIT should be more row count. This is so that when input data is changed and this test becomes flaky again, we still know how to fix it.
query IIIIB | ||
SELECT * FROM t0 FULL JOIN t1 ON t0.c1 = t1.c1 LIMIT 2; | ||
query IIIIB rowsort | ||
SELECT * FROM t0 FULL JOIN t1 ON t0.c1 = t1.c1 LIMIT 20; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SELECT * FROM t0 FULL JOIN t1 ON t0.c1 = t1.c1 LIMIT 20; | |
-- Note: using LIMIT value higher than cardinality before LIMIT to avoid query non-determinism | |
SELECT * FROM t0 FULL JOIN t1 ON t0.c1 = t1.c1 LIMIT 20; |
Co-authored-by: Piotr Findeisen <piotr.findeisen@gmail.com>
Co-authored-by: Piotr Findeisen <piotr.findeisen@gmail.com>
Co-authored-by: Piotr Findeisen <piotr.findeisen@gmail.com>
* Fix flaky join tests * fix flaky join slts * Update datafusion/sqllogictest/test_files/joins.slt Co-authored-by: Piotr Findeisen <piotr.findeisen@gmail.com> * Update datafusion/sqllogictest/test_files/joins.slt Co-authored-by: Piotr Findeisen <piotr.findeisen@gmail.com> * Update datafusion/sqllogictest/test_files/joins.slt Co-authored-by: Piotr Findeisen <piotr.findeisen@gmail.com> --------- Co-authored-by: Piotr Findeisen <piotr.findeisen@gmail.com>
Which issue does this PR close?
Rationale for this change
What changes are included in this PR?
When I was working on a join related task, I found some sqlogictests failing due to different result sort orders.
However, those queries should not assume the returned order, so this PR enforce static order for them using
rowsort
option in sqllogictest framework.Are these changes tested?
Are there any user-facing changes?