-
Notifications
You must be signed in to change notification settings - Fork 746
Include SubPlans when checking that a query is still distributed. #8388
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #8388 +/- ##
==========================================
- Coverage 88.80% 88.78% -0.03%
==========================================
Files 287 287
Lines 63239 63258 +19
Branches 7928 7931 +3
==========================================
+ Hits 56158 56161 +3
- Misses 4750 4765 +15
- Partials 2331 2332 +1 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR fixes issue #8313 where queries with Citus tables that were reduced to read_intermediate_result() calls after recursive planning of CTEs and subqueries were incorrectly flagged as not requiring distributed planning. The fix enhances the distributed planning check in distributed_planner() hook to inspect both the range table and the plan's subplan list for distributed relations.
- Enhanced distributed planning check to include subplan inspection for
read_intermediate_result()function calls - Made
IsReadIntermediateResultFunction()publicly accessible for reuse - Added comprehensive regression tests covering various scenarios with distributed subplans
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| src/backend/distributed/planner/distributed_planner.c | Added PlanContainsDistributedSubPlanRTE() function to check for distributed subplans and updated CheckPostPlanDistribution() to use it |
| src/backend/distributed/planner/multi_logical_planner.c | Changed IsReadIntermediateResultFunction() from static to public visibility |
| src/include/distributed/multi_logical_planner.h | Added declaration for the newly exported IsReadIntermediateResultFunction() |
| src/test/regress/sql/subquery_in_where.sql | Added test cases for queries with distributed subplans including minimal repro and variants with redundant WHERE clauses |
| src/test/regress/expected/subquery_in_where.out | Added expected output for the new test cases verifying correct results |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
3d57c82 to
81c19f1
Compare
| FROM | ||
| users_table); | ||
| users_table) | ||
| ORDER BY id; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to reviewer: this change is not relevant to the fix, but this regress failed flakyness tests because of differences in the order of the results of this query. The ORDER BY stabilizes that.
|
@colm-mchugh looks like this is also an related issue: #8312 |
Yes it does, and the PR also fixes #8312, from testing locally. Thanks for pointing out! |
81c19f1 to
117570c
Compare
DESCRIPTION: tighten distributed plan check to cover distributed subplans. The `distributed_planner()` hook checks, after calling `standard_planner()`, if the query still requires distributed planning - this was necessitated by issue #7782, #7783 where the citus tables in a query are optimized away by `standard_planner()`. However, issue #8313 exposed a case where we incorrectly flag a query as no long needing distributed planning; the query has the following format: ``` SELECT t0.a1 as c1, (SELECT xx FROM t1 LIMIT 1) as c2 FROM t0 WHERE true::bool ``` where the only citus table is `t1`. The scalar subquery is reduced to a read_intermediate_result() call after recursive planning of CTEs and subqueries: ``` SELECT t0.a1 as c1, (SELECT .. FROM read_intermediate_result() ... FROM t0 ``` with the consequence that function `CheckPostPlanDistribution()` does not see any citus tables in the query plan and marks the query as not distributed. The fix enhances `CheckPostPlanDistribution()` to look at the plan's subplan list for a read intermediate result call, and marks the query as needing distributed planning if one is found. The new check uses existing function `IsReadIntermediateResultFunction()` to detect the presence of a distributed subplan. This function is already used by `intermediate_result_pruning.c` to build a list of distributed subplan structs, so can be reused here to determine if a query needs distributed planning. Fixes issue: #8313
117570c to
e368b8c
Compare
Include SubPlans when checking that a query is still distributed.
DESCRIPTION: tighten distributed plan check to cover distributed subplans.
The
distributed_planner()hook needs to check, after callingstandard_planner(), if the query still requires distributed planning - this was necessitated by issue #7782, #7783 where the citus tables in a query are optimized away bystandard_planner(). However, issue #8313 exposed a case where we incorrectly flag a query as no long needing distributed planning; the query hasthe following format:
where the only citus table is
t1. The scalar subquery is reduced to aread_intermediate_result()call after recursive planning ofCTEs and subqueries:
with the consequence that function
CheckPostPlanDistribution()does not see any citus tables in the query plan and marks the query as not distributed. The fix enhancesCheckPostPlanDistribution()to additionally look at the plan's subplan list for a read intermediate result call, and mark the query as needing distributed planning if one is found. The new check uses existing functionIsReadIntermediateResultFunction()to detect the presence of a distributed subplan. This function is already used byintermediate_result_pruning.cto build a list of distributed subplan structs, so can be reused here to determine ifa query still needs distributed planning.
Fixes issue: #8313