Skip to content

Conversation

2010YOUY01
Copy link
Contributor

Which issue does this PR close?

  • Closes #.

Rationale for this change

What changes are included in this PR?

When I was working on a join related task, I found some sqlogictests failing due to different result sort orders.
However, those queries should not assume the returned order, so this PR enforce static order for them using rowsort option in sqllogictest framework.

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Jul 23, 2025
query IIIIB
SELECT * FROM t0 FULL JOIN t1 ON t0.c1 = t1.c1 LIMIT 2;
query IIIIB rowsort
SELECT * FROM t0 FULL JOIN t1 ON t0.c1 = t1.c1 LIMIT 20;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those 3 tests are supposed to test the GlobalLimit operator, instead of TopK, using order by might change the test goal.
To ensure those tests are deterministic, I set the limit to a number larger than the result set.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using order by might change the test goal.

good point

set the limit to a number larger than the result set.

doesn't it also change the test goal though? for example, if the LIMIT n would pass n+3 rows, this test would still pass

I don't think SLT with static expected values is a way to test LIMIT n at all
we need to accept this fact.
i am ok with the change proposed here, ie LIMIT being more than actual row count.

nit: let's maybe add a code comment that LIMIT should be more row count. This is so that when input data is changed and this test becomes flaky again, we still know how to fix it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
SELECT * FROM t0 FULL JOIN t1 ON t0.c1 = t1.c1 LIMIT 20;
-- Note: using LIMIT value higher than cardinality before LIMIT to avoid query non-determinism
SELECT * FROM t0 FULL JOIN t1 ON t0.c1 = t1.c1 LIMIT 20;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review and suggested changes.

That's a great point. Yes, it still changes the test goal, but in my opinion, it's closer to the original test compared to TopK.

I think this case could be covered by a sqllogictest property check runner — one that detects queries with a global limit k and asserts that the final result size is <= k.
Implementing such a sqllogictest property check extension is on my to-do list, and I hope to find time to work on it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does sqllogictest have a reasonable way to add extensions? or do you mean modifying the upstream library?

at dbt we have some extensions to SLT, but they always feel like hacks at the code level (like special comments)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No AFAIK, one of the recent extension is also a bit hacky #16183

query IIIIB
SELECT * FROM t0 FULL JOIN t1 ON t0.c1 = t1.c1 LIMIT 2;
query IIIIB rowsort
SELECT * FROM t0 FULL JOIN t1 ON t0.c1 = t1.c1 LIMIT 20;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using order by might change the test goal.

good point

set the limit to a number larger than the result set.

doesn't it also change the test goal though? for example, if the LIMIT n would pass n+3 rows, this test would still pass

I don't think SLT with static expected values is a way to test LIMIT n at all
we need to accept this fact.
i am ok with the change proposed here, ie LIMIT being more than actual row count.

nit: let's maybe add a code comment that LIMIT should be more row count. This is so that when input data is changed and this test becomes flaky again, we still know how to fix it.

query IIIIB
SELECT * FROM t0 FULL JOIN t1 ON t0.c1 = t1.c1 LIMIT 2;
query IIIIB rowsort
SELECT * FROM t0 FULL JOIN t1 ON t0.c1 = t1.c1 LIMIT 20;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
SELECT * FROM t0 FULL JOIN t1 ON t0.c1 = t1.c1 LIMIT 20;
-- Note: using LIMIT value higher than cardinality before LIMIT to avoid query non-determinism
SELECT * FROM t0 FULL JOIN t1 ON t0.c1 = t1.c1 LIMIT 20;

2010YOUY01 and others added 3 commits July 23, 2025 21:28
Co-authored-by: Piotr Findeisen <piotr.findeisen@gmail.com>
Co-authored-by: Piotr Findeisen <piotr.findeisen@gmail.com>
Co-authored-by: Piotr Findeisen <piotr.findeisen@gmail.com>
@2010YOUY01 2010YOUY01 merged commit 3a40625 into apache:main Jul 24, 2025
27 checks passed
adriangb pushed a commit to pydantic/datafusion that referenced this pull request Jul 28, 2025
* Fix flaky join tests

* fix flaky join slts

* Update datafusion/sqllogictest/test_files/joins.slt

Co-authored-by: Piotr Findeisen <piotr.findeisen@gmail.com>

* Update datafusion/sqllogictest/test_files/joins.slt

Co-authored-by: Piotr Findeisen <piotr.findeisen@gmail.com>

* Update datafusion/sqllogictest/test_files/joins.slt

Co-authored-by: Piotr Findeisen <piotr.findeisen@gmail.com>

---------

Co-authored-by: Piotr Findeisen <piotr.findeisen@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants