Skip to content

Conversation

@ChrisHegarty
Copy link
Contributor

@ChrisHegarty ChrisHegarty commented Nov 19, 2025

This commit reinstates, fixes, and tests the native int7u bulk dot product #138239

The issue with the original change is that size_t is 64 bit, while we pass a 32 bit int from java. This is also arguably an issue with the other native definitions, but doesn't cause an issue because of position in the declaration (last). We should fix these declarations, but as a separate PR.

closes #138302

@ChrisHegarty ChrisHegarty requested a review from a team as a code owner November 19, 2025 18:04
@ChrisHegarty ChrisHegarty added >test Issues or PRs that are addressing/adding tests :Search Relevance/Vectors Vector search Team:Search - Relevance The Search organization Search Relevance team labels Nov 19, 2025
@elasticsearchmachine elasticsearchmachine added Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch v9.3.0 and removed Team:Search - Relevance The Search organization Search Relevance team labels Nov 19, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-relevance (Team:Search Relevance)

@benwtrent
Copy link
Member

I completely reverted the java portion of the PR. Sorry. We can add the java portion back, but just not use it until we figure out the segfault

Thank you for the tests!!!

@ChrisHegarty
Copy link
Contributor Author

ChrisHegarty commented Nov 19, 2025

I completely reverted the java portion of the PR. Sorry. We can add the java portion back, but just not use it until we figure out the segfault

Thank you for the tests!!!

Ah, got it. Thanks. I didn't realise that there was a problem with the original change that was reverted. I was noodling with an alternative implementation, which caused me to write the test. I think that we can squeeze a bit more perf out of this code, possibly another 15-20% on top of what you already got!

@benwtrent
Copy link
Member

Absolutely beautiful!

I am not sure the cause of the segfault, but I suspect alignment issues by casting the int to a float.

So maybe we need to adjust the function definition or we do the casting in such a way it doesn't suffer from alignment problems

@ChrisHegarty ChrisHegarty added >refactoring and removed >non-issue >test Issues or PRs that are addressing/adding tests labels Nov 20, 2025
@ChrisHegarty
Copy link
Contributor Author

ChrisHegarty commented Nov 20, 2025

For my own future reference, a few command to help build and test locally.

AArch64

$ cd libs/simdvec/native && ./gradlew buildSharedLibrary && cd -

$ cp libs/simdvec/native/build/libs/vec/shared/aarch64/libvec.dylib libs/native/libraries/build/platform/darwin-aarch64/libvec.dylib 

$ LOCAL_VEC_BINARY_OS=darwin ./gradlew :libs:native:test --tests "JDKVectorLibraryInt7uTests" -Dtests.iters=100
$ LOCAL_VEC_BINARY_OS=darwin ./gradlew :libs:simdvec:test -Dtests.iters=100

$ LOCAL_VEC_BINARY_OS=darwin ./gradlew -p benchmarks run --args "org.elasticsearch.benchmark.vector.Int7ScorerBenchmark.*Bulk"

x64

$ cd libs/simdvec/native && ./gradlew buildSharedLibrary && cd -

$ cp libs/simdvec/native/build/libs/vec/shared/amd64/libvec.so libs/native/libraries/build/platform/linux-x64/libvec.so 

$ LOCAL_VEC_BINARY_OS=amd64 ./gradlew :libs:native:test --tests "JDKVectorLibraryInt7uTests" -Dtests.iters=1000
$ LOCAL_VEC_BINARY_OS= amd64 ./gradlew :libs:simdvec:test -Dtests.iters=100

$  LOCAL_VEC_BINARY_OS=amd64 ./gradlew -p benchmarks run --args "org.elasticsearch.benchmark.vector.Int7ScorerBenchmark.*Bulk"

@ChrisHegarty ChrisHegarty changed the title Add a test for the native int7u bulk dot product Reinstate and test the native int7u bulk dot product Nov 20, 2025
@ChrisHegarty ChrisHegarty added the test-arm Pull Requests that should be tested against arm agents label Nov 20, 2025
@ChrisHegarty
Copy link
Contributor Author

Posting some benchmarking numbers.

Linux x64 - 11th Gen Intel(R) Core(TM) i5-11400 @ 2.60GHz

Benchmark                                       (dims)   Mode  Cnt   Score   Error   Units
Before
Int7ScorerBenchmark.scoreFromMemorySegmentBulk     384  thrpt    5  24.548 ± 0.120  ops/ms
Int7ScorerBenchmark.scoreFromMemorySegmentBulk     782  thrpt    5  21.737 ± 0.760  ops/ms
Int7ScorerBenchmark.scoreFromMemorySegmentBulk    1024  thrpt    5  22.471 ± 0.555  ops/ms
After
Int7ScorerBenchmark.scoreFromMemorySegmentBulk     384  thrpt    5  60.246 ± 0.464  ops/ms
Int7ScorerBenchmark.scoreFromMemorySegmentBulk     782  thrpt    5  35.816 ± 0.085  ops/ms
Int7ScorerBenchmark.scoreFromMemorySegmentBulk    1024  thrpt    5  41.445 ± 0.057  ops/ms

Mac M2

Benchmark                                       (dims)   Mode  Cnt   Score    Error   Units
Before
Int7ScorerBenchmark.scoreFromMemorySegmentBulk     384  thrpt    5  36.335 ± 0.192  ops/ms
Int7ScorerBenchmark.scoreFromMemorySegmentBulk     782  thrpt    5  20.941 ± 1.066  ops/ms
Int7ScorerBenchmark.scoreFromMemorySegmentBulk    1024  thrpt    5  19.750 ± 0.343  ops/ms
After
Int7ScorerBenchmark.scoreFromMemorySegmentBulk     384  thrpt    5  54.776 ± 13.235  ops/ms
Int7ScorerBenchmark.scoreFromMemorySegmentBulk     782  thrpt    5  25.895 ±  0.621  ops/ms
Int7ScorerBenchmark.scoreFromMemorySegmentBulk    1024  thrpt    5  24.141 ±  0.037  ops/ms

static void dotProduct7uBulk(MemorySegment a, MemorySegment b, int length, int count, MemorySegment result) {
Objects.checkFromIndexSize(0, length * count, (int) a.byteSize());
Objects.checkFromIndexSize(0, length, (int) b.byteSize());
Objects.checkFromIndexSize(0, count * Float.BYTES, (int) result.byteSize());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added these runtime checks here just before going native, similar to other functions, so that we have a handle on the values that we pass through. Passing invalid values could lead to crashes or worse, so we enforce these here.

Copy link
Contributor

@ldematte ldematte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, and thanks for the additional tests!
I had a feeling the segfault was due to a wrong type size, but I was chasing the wrong guy :) Nice find!

}
}

public void testInt7uBulk() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test reliably reproduces the previous segv issue.

@ChrisHegarty ChrisHegarty merged commit 0b3b48a into elastic:main Nov 20, 2025
40 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>refactoring :Search Relevance/Vectors Vector search Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch test-arm Pull Requests that should be tested against arm agents v9.3.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Segfault in new bulk query handling Diskbbq

4 participants