SOLR-17319: CompoundQueryComponent scribbles continued with CompoundQueryComponentTest #3648

cpoerschke · 2025-09-10T13:29:42Z

continuation of the earlier #2597 scribbles

https://issues.apache.org/jira/browse/SOLR-17319

…cess

…Component to use it)

…ueryComponent to use it

…tTest

…deprecate direct access' changes for illustrative purposes

solr/core/src/java/org/apache/solr/handler/component/CompoundResponseBuilder.java

…ontinued

…getParameterPrefix

…rrf.2. prefixes)

…response

dsmiley

I freaking love the elegance / simplicity here. It re-uses / leverages the existing distributed search phases of the SearchComponent design (as I hinted in JIRA may be possible), and meanwhile making that design more flexible for this new use-case. And reuses TopDocs.rrf logic :-) I imagine this approach is doing query work in parallel without any extra code? That said, clearly this is a draft / POC. Perhaps there are serious flaws that aren't easily evident in a code review. Testing was minimal. I see no accommodations for faceting to ensure we don't wastefully compute the docSet.

@ercsonusharma Did you see this? Please give this a good review. I suspect the best solution may be a combination of both efforts. For example your (recently redone) tests, documentation, user experience (how to use it) are nice.

I'd be tempted to run both solutions with distributed tracing enabled to a local Zipkin/Jaeger to visualize what's going on. I used to have a shelved code snippet to instrument a test to do this.

…nent)

solr/core/src/java/org/apache/solr/handler/component/CompoundQueryComponent.java

cpoerschke · 2025-09-18T20:20:52Z

I imagine this approach is doing query work in parallel without any extra code?

Yes, that's my understanding too.

I see no accommodations for faceting to ensure we don't wastefully compute the docSet.

Yeah, my understanding is that with this sort of approach there is conceptually no (obvious) support for faceting. Today, Solr's client would make two queries and then outside-of-Solr do the RRF locally -- conceptually that is what the approach here does too. Because the query work is in parallel and the doc sets are (I think) not passed back to where the fusion happens, then truly-correct faceting would not implicitly be available. ...

... having said that, currently the sub-queries are there and there are only sub-queries, so I imagine a fusion-aware faceting component might do an OR of the sub-queries and then send that for faceting purposes only i.e. with rows=0 to the shards.

…ontinued

ercsonusharma · 2025-09-19T04:22:31Z

solr/core/src/java/org/apache/solr/handler/component/CompoundQueryComponent.java

+      }
+    }
+  }
+


I don't see the process method here and wonder how the individual queries from responseBuilders are executed on the shard level from the SearchIndexer?

ercsonusharma · 2025-09-19T04:24:23Z

solr/core/src/java/org/apache/solr/handler/component/CompoundQueryComponent.java

+    final TopDocs[] hits = new TopDocs[crb.responseBuilders.size()];
+    for (int crb_idx = 0; crb_idx < crb.responseBuilders.size(); ++crb_idx) {
+
+      final SolrDocumentList sdl = crb.responseBuilders.get(crb_idx).getResponseDocs();


I am wondering where is this being populated?

ercsonusharma · 2025-09-19T04:29:34Z

solr/core/src/java/org/apache/solr/handler/component/CompoundQueryComponent.java

+    if (rb instanceof CompoundResponseBuilder crb) {
+      for (var rb_i : crb.responseBuilders) {
+        if (rb_i.isThisFromMe(sreq)) {
+          super.handleResponses(rb_i, sreq);


Are we using mergeIds on each of the crb and overriding the rb.resultIds?

ercsonusharma · 2025-09-19T04:35:54Z

Thanks for the draft @cpoerschke - appreciate it!
I did a quick review.

Below are my initial observations:
Honestly, I couldn't understand most of this part here, as this is a draft; however, I have added a few comments.
Creating a new stage in the distributed phase may create lots of complexity around other components. It makes sense when you are creating a new set of queries that have to be fired across the shards, but the ultimate goal of the phase introduced here is just to merge the results of all the shardresponse which can be done in the handleResponse method of QueryComponent without adding any additional phase. Rather, I see reusing the same stage of the distributed phase makes a lot of things easier to handle.

And reuses TopDocs.rrf logic :

I already checked TopDocs.rrf, and it is meant for rrf on TopDocs, especially for Lucene. I find it less effort in creating something ShardDoc.rrf rather than converting the SolrDocumentList to TopcDocs and back.

I imagine this approach is doing query work in parallel without any extra code?

Query work is already happening in parallel by design of the distributed process. The things that need to be parallelised is merging the docs per query post querying, which I see sequentially here as well.

I'm really doubtful at this point in time as to how the faceting & highlighting would work.

…ghlightComponent to use it)

…et-fields finishStage did it i.e. now it happens after instead of before the fusion)

…ghtComponent to use it)

…nent and other (small) tweaks to match, for deterministic highlighting when document found for both sub-queries;

cpoerschke · 2025-09-19T14:51:21Z

solr/core/src/java/org/apache/solr/handler/component/HighlightComponent.java

@@ -212,27 +212,31 @@ public void handleResponses(ResponseBuilder rb, ShardRequest sreq) {}

  @Override
  public void finishStage(ResponseBuilder rb) {


'Hide whitespace' diff mode makes it easier to see that the changes to this method are small (and extremely subtle, should add comments really, later).

cpoerschke · 2025-09-19T17:20:37Z

Thanks @ercsonusharma for starting to take a look here!

Yes, there's relatively little code but I totally appreciate that it's not easy to understand.

... with distributed tracing enabled to a local Zipkin/Jaeger to visualize what's going on. ...

That's an interesting idea, thanks @dsmiley for sharing! @hossman's "Lifecycle of a Solr Search Request" talk slides and recording from quite-a-while-ago-now also spring to mind for me here.

Brief replies to some of your initial observations.

... Creating a new stage in the distributed phase may create lots of complexity around other components. ...

Yes, for each component it will need to be considered how it should behave in a fusion scenario and whether or not it should participate in the fusion stage, or just skip it, which is the implied default.

... Query work is already happening in parallel by design of the distributed process. ...

The distributed process provides parallelism on the shard level e.g. if we have 10 shards then all 10 will run in parallel. Within each shard however, as I understand the current #3418 code, the CombinedQueryComponent.process method will run one sub-query after the other, where the line 181 comment says "// TODO: to be parallelized" currently.

The parallelism here on #3648 is on the shard level and also on the sub-queries level:

CombinedQueryComponent.distributedProcess will for one sub-query after the other add a request to the outgoing queue of requests,
the search handler will send each added request to the 10 shards (so now we have 20 things happening in parallel!!),
the shards will process each request sent to them (running exactly only one of the sub-queries),
the 20 responses will be received and handled (with the subtlety of needing to make sure that the response is handled only by whoever added it).

... how the faceting & highlighting would work.

Answering only on the highlighting work at this time, I've advanced the proof-of-concept further so that the test case does cover highlighting. The highlighting test passes and it may help to develop an understanding on how that extra fusion stage fits into the picture.

... with distributed tracing enabled to a local Zipkin/Jaeger to visualize what's going on. I used to have a shelved code snippet to instrument a test to do this.

@dsmiley - I don't suppose you'd be in a position to dig up or dust off that code snippet?

cpoerschke added 10 commits July 26, 2024 16:21

SOLR-17319: CompoundQueryComponent scribbles

91859d9

action CI feedback: add missing @OverRide

124e69d

action CI feedback: add missing @OverRide

3e83fe2

Merge branch 'apache:main' into SOLR-17319-compound-query-component

67864fc

Merge branch 'apache:main' into SOLR-17319-compound-query-component

c7c02cf

provide ResponseBuilder.(get,set)Stage accessors, deprecate direct ac…

9ad96d8

…cess

factor out protected ResponseBuilder.getDoneStage() (and change Query…

48cafb9

…Component to use it)

factor out protected ResponseBuilder.getQParameterName() and change Q…

98a4f85

…ueryComponent to use it

CompoundQueryComponent scribbles continued with CompoundQueryComponen…

fb59cc3

…tTest

defer part of the 'provide ResponseBuilder.(get,set)Stage accessors, …

9dcfab1

…deprecate direct access' changes for illustrative purposes

github-actions bot added tests cat:search labels Sep 10, 2025

cpoerschke mentioned this pull request Sep 10, 2025

SOLR-17319: CompoundQueryComponent scribbles #2597

Draft

action precommit CI feedback

c6a0f81

cpoerschke commented Sep 10, 2025

View reviewed changes

solr/core/src/java/org/apache/solr/handler/component/CompoundResponseBuilder.java Outdated Show resolved Hide resolved

cpoerschke added 11 commits September 10, 2025 14:52

action CI precommit feedback w.r.t. long-vs-int for numFound

d7ec06a

update stale CompoundResponseBuilder.java comment

11ac6a4

tidy: wrap comment

7f9a2bb

Merge branch 'apache:main' into SOLR-17319-compound-query-component-c…

3330603

…ontinued

try out TopDocs.rrf API

45f7739

use Local Params syntax to have different sort for the two queries

5ee1522

remove rrf=true requirement (CompoundSearchHandler implies it)

9749733

switch from rrf.q.N to rrf.N.q

183bc1f

generalise from ResponseBuilder.getQParameterName to ResponseBuilder.…

7f113c5

…getParameterPrefix

require rrf.prefix.list parameter (instead of hard-coding rrf.1. and …

217343b

…rrf.2. prefixes)

change second query from '-bee' to '+forage' and check text field in …

7d9b83d

…response

cpoerschke mentioned this pull request Sep 16, 2025

NO JIRA: provide ResponseBuilder.(get,set)Stage accessors, deprecate direct access #3664

Open

dsmiley reviewed Sep 18, 2025

View reviewed changes

dsmiley mentioned this pull request Sep 18, 2025

SOLR-17319 : Combined Query Feature for Multi Query Execution #3418

Open

7 tasks

explore highlighting support (without changing the highlighting compo…

278874a

…nent)

cpoerschke commented Sep 18, 2025

View reviewed changes

solr/core/src/java/org/apache/solr/handler/component/CompoundQueryComponent.java Outdated Show resolved Hide resolved

Merge branch 'apache:main' into SOLR-17319-compound-query-component-c…

3e9e547

…ontinued

ercsonusharma reviewed Sep 19, 2025

View reviewed changes

cpoerschke added 6 commits September 19, 2025 10:43

factor out protected ResponseBuilder.getPreDoneStage() (and change Hi…

5772016

…ghlightComponent to use it)

doFusion now sets CompoundResponseBuilder.resultIds (previously the g…

338ed25

…et-fields finishStage did it i.e. now it happens after instead of before the fusion)

results from the sub-queries can overlap

f0d0c8f

doFusion now correctly sets resultIds' ShardDoc.positionInResponse

dc3b348

factor out protected ResponseBuilder.getFinished() (and change Highli…

dd4c52c

…ghtComponent to use it)

CompoundResponseBuilder now overrides getFinished; CompoundQueryCompo…

af80bda

…nent and other (small) tweaks to match, for deterministic highlighting when document found for both sub-queries;

cpoerschke commented Sep 19, 2025

View reviewed changes

action CI feedback w.r.t. == vs. equals in test

ab9cde6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

SOLR-17319: CompoundQueryComponent scribbles continued with CompoundQueryComponentTest #3648

SOLR-17319: CompoundQueryComponent scribbles continued with CompoundQueryComponentTest #3648

cpoerschke commented Sep 10, 2025

Uh oh!

Uh oh!

dsmiley left a comment

Uh oh!

Uh oh!

cpoerschke commented Sep 18, 2025

Uh oh!

ercsonusharma Sep 19, 2025

Uh oh!

ercsonusharma Sep 19, 2025

Uh oh!

ercsonusharma Sep 19, 2025

Uh oh!

ercsonusharma commented Sep 19, 2025 •

edited

Loading

Uh oh!

cpoerschke Sep 19, 2025

Uh oh!

cpoerschke commented Sep 19, 2025

Uh oh!

Uh oh!

		@@ -212,27 +212,31 @@ public void handleResponses(ResponseBuilder rb, ShardRequest sreq) {}

		@Override
		public void finishStage(ResponseBuilder rb) {

SOLR-17319: CompoundQueryComponent scribbles continued with CompoundQueryComponentTest #3648

Are you sure you want to change the base?

SOLR-17319: CompoundQueryComponent scribbles continued with CompoundQueryComponentTest #3648

Conversation

cpoerschke commented Sep 10, 2025

Uh oh!

Uh oh!

dsmiley left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cpoerschke commented Sep 18, 2025

Uh oh!

ercsonusharma Sep 19, 2025

Choose a reason for hiding this comment

Uh oh!

ercsonusharma Sep 19, 2025

Choose a reason for hiding this comment

Uh oh!

ercsonusharma Sep 19, 2025

Choose a reason for hiding this comment

Uh oh!

ercsonusharma commented Sep 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cpoerschke Sep 19, 2025

Choose a reason for hiding this comment

Uh oh!

cpoerschke commented Sep 19, 2025

Uh oh!

Uh oh!

ercsonusharma commented Sep 19, 2025 •

edited

Loading