Skip to content

Improve DiskBBQ filtered search through filtering centroids #132933

@benwtrent

Description

@benwtrent

Description

When doing a filtered search over DiskBBQ, do the simple thing and explore more centroids until we capture the expected overall percentage of vectors.

However, this means we just explore more and more centroids, scoring more and visiting useless centroids. While we don't actually do any vector ops, its interesting to see how docID decoding and figuring out there are not matches, becomes a strangely dominate cost.

I think we can speed up highly filtered search through adding (though this may be expensive) an additional mapping from vectorOrd -> [centroid_primary, centroid_overspill]

When we detect very specific filters, such that the probability of hitting the vectors in a centroid becomes very low, we can do a first pass with that restricted filter to gather the matching centroids, and then only score those.

This should be optional as I expect it to add overhead at index and index size, though I expect the index size to not be effect way too much?

//cc @jimczi

Metadata

Metadata

Assignees

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions