Skip to content

Conversation

@colm-mchugh
Copy link
Contributor

@colm-mchugh colm-mchugh commented Dec 18, 2025

Include SubPlans when checking that a query is still distributed.

DESCRIPTION: tighten distributed plan check to cover distributed subplans.

The distributed_planner() hook needs to check, after calling standard_planner(), if the query still requires distributed planning - this was necessitated by issue #7782, #7783 where the citus tables in a query are optimized away by standard_planner(). However, issue #8313 exposed a case where we incorrectly flag a query as no long needing distributed planning; the query has
the following format:

SELECT t0.a1 as c1, 
       (SELECT xx FROM t1 LIMIT 1) as c2
FROM t0
WHERE true::bool

where the only citus table is t1. The scalar subquery is reduced to a read_intermediate_result() call after recursive planning of
CTEs and subqueries:

SELECT t0.a1 as c1, 
       (SELECT .. FROM read_intermediate_result() ... ) as c2
FROM t0

with the consequence that function CheckPostPlanDistribution() does not see any citus tables in the query plan and marks the query as not distributed. The fix enhances CheckPostPlanDistribution() to additionally look at the plan's subplan list for a read intermediate result call, and mark the query as needing distributed planning if one is found. The new check uses existing function IsReadIntermediateResultFunction() to detect the presence of a distributed subplan. This function is already used by intermediate_result_pruning.c to build a list of distributed subplan structs, so can be reused here to determine if
a query still needs distributed planning.

Fixes issue: #8313

@codecov
Copy link

codecov bot commented Dec 18, 2025

Codecov Report

❌ Patch coverage is 61.90476% with 8 lines in your changes missing coverage. Please review.
✅ Project coverage is 88.78%. Comparing base (87cd100) to head (e368b8c).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #8388      +/-   ##
==========================================
- Coverage   88.80%   88.78%   -0.03%     
==========================================
  Files         287      287              
  Lines       63239    63258      +19     
  Branches     7928     7931       +3     
==========================================
+ Hits        56158    56161       +3     
- Misses       4750     4765      +15     
- Partials     2331     2332       +1     
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes issue #8313 where queries with Citus tables that were reduced to read_intermediate_result() calls after recursive planning of CTEs and subqueries were incorrectly flagged as not requiring distributed planning. The fix enhances the distributed planning check in distributed_planner() hook to inspect both the range table and the plan's subplan list for distributed relations.

  • Enhanced distributed planning check to include subplan inspection for read_intermediate_result() function calls
  • Made IsReadIntermediateResultFunction() publicly accessible for reuse
  • Added comprehensive regression tests covering various scenarios with distributed subplans

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/backend/distributed/planner/distributed_planner.c Added PlanContainsDistributedSubPlanRTE() function to check for distributed subplans and updated CheckPostPlanDistribution() to use it
src/backend/distributed/planner/multi_logical_planner.c Changed IsReadIntermediateResultFunction() from static to public visibility
src/include/distributed/multi_logical_planner.h Added declaration for the newly exported IsReadIntermediateResultFunction()
src/test/regress/sql/subquery_in_where.sql Added test cases for queries with distributed subplans including minimal repro and variants with redundant WHERE clauses
src/test/regress/expected/subquery_in_where.out Added expected output for the new test cases verifying correct results

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@colm-mchugh colm-mchugh marked this pull request as draft December 18, 2025 19:55
@colm-mchugh colm-mchugh marked this pull request as ready for review December 18, 2025 20:28
FROM
users_table);
users_table)
ORDER BY id;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to reviewer: this change is not relevant to the fix, but this regress failed flakyness tests because of differences in the order of the results of this query. The ORDER BY stabilizes that.

@imranzaheer612
Copy link
Contributor

imranzaheer612 commented Dec 19, 2025

@colm-mchugh looks like this is also an related issue: #8312

@colm-mchugh
Copy link
Contributor Author

@colm-mchugh looks like this is also an related issue: #8312

Yes it does, and the PR also fixes #8312, from testing locally. Thanks for pointing out!

DESCRIPTION: tighten distributed plan check to cover distributed subplans.

The `distributed_planner()` hook checks, after calling `standard_planner()`,
if the query still requires distributed planning - this was necessitated by
issue #7782, #7783 where the citus tables in a query are optimized away by
`standard_planner()`.  However, issue #8313 exposed a case where we incorrectly
flag a query as no long needing distributed planning; the query has
the following format:
```
SELECT t0.a1 as c1,
       (SELECT xx FROM t1 LIMIT 1) as c2
FROM t0
WHERE true::bool
```
where the only citus table is `t1`. The scalar subquery is reduced
to a read_intermediate_result() call after recursive planning of
CTEs and subqueries:
```
SELECT t0.a1 as c1,
       (SELECT .. FROM read_intermediate_result() ...
FROM t0
```
with the consequence that function `CheckPostPlanDistribution()` does
not see any citus tables in the query plan and marks the query as not
distributed. The fix enhances `CheckPostPlanDistribution()` to look
at the plan's subplan list for a read intermediate result call, and
marks the query as needing distributed planning if one is found. The
new check uses existing function `IsReadIntermediateResultFunction()`
to detect the presence of a distributed subplan. This function is
already used by `intermediate_result_pruning.c` to build a list of
distributed subplan structs, so can be reused here to determine if
a query needs distributed planning.

Fixes issue: #8313
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

4 participants