Skip to content

Conversation

cpoerschke
Copy link
Contributor

continuation of the earlier #2597 scribbles

https://issues.apache.org/jira/browse/SOLR-17319

Copy link
Contributor

@dsmiley dsmiley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I freaking love the elegance / simplicity here. It re-uses / leverages the existing distributed search phases of the SearchComponent design (as I hinted in JIRA may be possible), and meanwhile making that design more flexible for this new use-case. And reuses TopDocs.rrf logic :-) I imagine this approach is doing query work in parallel without any extra code? That said, clearly this is a draft / POC. Perhaps there are serious flaws that aren't easily evident in a code review. Testing was minimal. I see no accommodations for faceting to ensure we don't wastefully compute the docSet.

@ercsonusharma Did you see this? Please give this a good review. I suspect the best solution may be a combination of both efforts. For example your (recently redone) tests, documentation, user experience (how to use it) are nice.

I'd be tempted to run both solutions with distributed tracing enabled to a local Zipkin/Jaeger to visualize what's going on. I used to have a shelved code snippet to instrument a test to do this.

@cpoerschke
Copy link
Contributor Author

I imagine this approach is doing query work in parallel without any extra code?

Yes, that's my understanding too.

I see no accommodations for faceting to ensure we don't wastefully compute the docSet.

Yeah, my understanding is that with this sort of approach there is conceptually no (obvious) support for faceting. Today, Solr's client would make two queries and then outside-of-Solr do the RRF locally -- conceptually that is what the approach here does too. Because the query work is in parallel and the doc sets are (I think) not passed back to where the fusion happens, then truly-correct faceting would not implicitly be available. ...

... having said that, currently the sub-queries are there and there are only sub-queries, so I imagine a fusion-aware faceting component might do an OR of the sub-queries and then send that for faceting purposes only i.e. with rows=0 to the shards.

}
}
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see the process method here and wonder how the individual queries from responseBuilders are executed on the shard level from the SearchIndexer?

final TopDocs[] hits = new TopDocs[crb.responseBuilders.size()];
for (int crb_idx = 0; crb_idx < crb.responseBuilders.size(); ++crb_idx) {

final SolrDocumentList sdl = crb.responseBuilders.get(crb_idx).getResponseDocs();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering where is this being populated?

if (rb instanceof CompoundResponseBuilder crb) {
for (var rb_i : crb.responseBuilders) {
if (rb_i.isThisFromMe(sreq)) {
super.handleResponses(rb_i, sreq);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we using mergeIds on each of the crb and overriding the rb.resultIds?

@ercsonusharma
Copy link

ercsonusharma commented Sep 19, 2025

Thanks for the draft @cpoerschke - appreciate it!
I did a quick review.

Below are my initial observations:
Honestly, I couldn't understand most of this part here, as this is a draft; however, I have added a few comments.
Creating a new stage in the distributed phase may create lots of complexity around other components. It makes sense when you are creating a new set of queries that have to be fired across the shards, but the ultimate goal of the phase introduced here is just to merge the results of all the shardresponse which can be done in the handleResponse method of QueryComponent without adding any additional phase. Rather, I see reusing the same stage of the distributed phase makes a lot of things easier to handle.

And reuses TopDocs.rrf logic :

I already checked TopDocs.rrf, and it is meant for rrf on TopDocs, especially for Lucene. I find it less effort in creating something ShardDoc.rrf rather than converting the SolrDocumentList to TopcDocs and back.

I imagine this approach is doing query work in parallel without any extra code?

Query work is already happening in parallel by design of the distributed process. The things that need to be parallelised is merging the docs per query post querying, which I see sequentially here as well.

I'm really doubtful at this point in time as to how the faceting & highlighting would work.

@@ -212,27 +212,31 @@ public void handleResponses(ResponseBuilder rb, ShardRequest sreq) {}

@Override
public void finishStage(ResponseBuilder rb) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'Hide whitespace' diff mode makes it easier to see that the changes to this method are small (and extremely subtle, should add comments really, later).

@cpoerschke
Copy link
Contributor Author

Thanks @ercsonusharma for starting to take a look here!

Yes, there's relatively little code but I totally appreciate that it's not easy to understand.

... with distributed tracing enabled to a local Zipkin/Jaeger to visualize what's going on. ...

That's an interesting idea, thanks @dsmiley for sharing! @hossman's "Lifecycle of a Solr Search Request" talk slides and recording from quite-a-while-ago-now also spring to mind for me here.

Brief replies to some of your initial observations.

... Creating a new stage in the distributed phase may create lots of complexity around other components. ...

Yes, for each component it will need to be considered how it should behave in a fusion scenario and whether or not it should participate in the fusion stage, or just skip it, which is the implied default.

... Query work is already happening in parallel by design of the distributed process. ...

The distributed process provides parallelism on the shard level e.g. if we have 10 shards then all 10 will run in parallel. Within each shard however, as I understand the current #3418 code, the CombinedQueryComponent.process method will run one sub-query after the other, where the line 181 comment says "// TODO: to be parallelized" currently.

The parallelism here on #3648 is on the shard level and also on the sub-queries level:

  • CombinedQueryComponent.distributedProcess will for one sub-query after the other add a request to the outgoing queue of requests,
  • the search handler will send each added request to the 10 shards (so now we have 20 things happening in parallel!!),
  • the shards will process each request sent to them (running exactly only one of the sub-queries),
  • the 20 responses will be received and handled (with the subtlety of needing to make sure that the response is handled only by whoever added it).

... how the faceting & highlighting would work.

Answering only on the highlighting work at this time, I've advanced the proof-of-concept further so that the test case does cover highlighting. The highlighting test passes and it may help to develop an understanding on how that extra fusion stage fits into the picture.

... with distributed tracing enabled to a local Zipkin/Jaeger to visualize what's going on. I used to have a shelved code snippet to instrument a test to do this.

@dsmiley - I don't suppose you'd be in a position to dig up or dust off that code snippet?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants