Separate Panama and Vector classes #15285

kaivalnp · 2025-10-02T23:36:38Z

Addresses #15284

- Refactor internal classes - Add mising javadocs - Remove unused functions

kaivalnp · 2025-10-06T13:42:44Z

VectorUtilBenchmark results:

main

Benchmark                                                       (size)   Mode  Cnt   Score   Error   Units
VectorUtilBenchmark.binaryCosineScalar                            1024  thrpt   15   0.841 ± 0.001  ops/us
VectorUtilBenchmark.binaryCosineVector                            1024  thrpt   15   4.778 ± 0.012  ops/us
VectorUtilBenchmark.binaryDotProductScalar                        1024  thrpt   15   2.289 ± 0.012  ops/us
VectorUtilBenchmark.binaryDotProductUint8Scalar                   1024  thrpt   15   2.307 ± 0.010  ops/us
VectorUtilBenchmark.binaryDotProductUint8Vector                   1024  thrpt   15   8.040 ± 0.001  ops/us
VectorUtilBenchmark.binaryDotProductVector                        1024  thrpt   15   8.040 ± 0.001  ops/us
VectorUtilBenchmark.binaryHalfByteDotProductBothPackedScalar      1024  thrpt   15   2.368 ± 0.001  ops/us
VectorUtilBenchmark.binaryHalfByteDotProductBothPackedVector      1024  thrpt   15  11.652 ± 0.104  ops/us
VectorUtilBenchmark.binaryHalfByteDotProductScalar                1024  thrpt   15   2.378 ± 0.002  ops/us
VectorUtilBenchmark.binaryHalfByteDotProductSinglePackedScalar    1024  thrpt   15   2.446 ± 0.009  ops/us
VectorUtilBenchmark.binaryHalfByteDotProductSinglePackedVector    1024  thrpt   15   2.627 ± 0.013  ops/us
VectorUtilBenchmark.binaryHalfByteDotProductVector                1024  thrpt   15  20.677 ± 0.160  ops/us
VectorUtilBenchmark.binaryHalfByteSquareBothPackedScalar          1024  thrpt   15   1.642 ± 0.001  ops/us
VectorUtilBenchmark.binaryHalfByteSquareBothPackedVector          1024  thrpt   15  12.614 ± 0.010  ops/us
VectorUtilBenchmark.binaryHalfByteSquareScalar                    1024  thrpt   15   2.465 ± 0.006  ops/us
VectorUtilBenchmark.binaryHalfByteSquareSinglePackedScalar        1024  thrpt   15   2.022 ± 0.001  ops/us
VectorUtilBenchmark.binaryHalfByteSquareSinglePackedVector        1024  thrpt   15   2.590 ± 0.012  ops/us
VectorUtilBenchmark.binaryHalfByteSquareVector                    1024  thrpt   15  18.526 ± 0.012  ops/us
VectorUtilBenchmark.binarySquareScalar                            1024  thrpt   15   2.431 ± 0.007  ops/us
VectorUtilBenchmark.binarySquareUint8Scalar                       1024  thrpt   15   2.422 ± 0.025  ops/us
VectorUtilBenchmark.binarySquareUint8Vector                       1024  thrpt   15   6.709 ± 0.002  ops/us
VectorUtilBenchmark.binarySquareVector                            1024  thrpt   15   6.710 ± 0.001  ops/us
VectorUtilBenchmark.floatCosineScalar                             1024  thrpt   15   1.419 ± 0.001  ops/us
VectorUtilBenchmark.floatCosineVector                             1024  thrpt   75   8.913 ± 0.013  ops/us
VectorUtilBenchmark.floatDotProductScalar                         1024  thrpt   15   3.734 ± 0.004  ops/us
VectorUtilBenchmark.floatDotProductVector                         1024  thrpt   75  12.561 ± 0.346  ops/us
VectorUtilBenchmark.floatSquareScalar                             1024  thrpt   15   3.181 ± 0.013  ops/us
VectorUtilBenchmark.floatSquareVector                             1024  thrpt   75  12.370 ± 0.398  ops/us
VectorUtilBenchmark.l2Normalize                                   1024  thrpt   15   3.016 ± 0.002  ops/us
VectorUtilBenchmark.l2NormalizeVector                             1024  thrpt   75  12.349 ± 0.719  ops/us

This PR

Benchmark                                                       (size)   Mode  Cnt   Score   Error   Units
VectorUtilBenchmark.binaryCosineScalar                            1024  thrpt   15   0.841 ± 0.001  ops/us
VectorUtilBenchmark.binaryCosineVector                            1024  thrpt   15   4.860 ± 0.007  ops/us
VectorUtilBenchmark.binaryDotProductScalar                        1024  thrpt   15   2.298 ± 0.014  ops/us
VectorUtilBenchmark.binaryDotProductUint8Scalar                   1024  thrpt   15   2.288 ± 0.024  ops/us
VectorUtilBenchmark.binaryDotProductUint8Vector                   1024  thrpt   15   8.040 ± 0.001  ops/us
VectorUtilBenchmark.binaryDotProductVector                        1024  thrpt   15   8.039 ± 0.001  ops/us
VectorUtilBenchmark.binaryHalfByteDotProductBothPackedScalar      1024  thrpt   15   2.376 ± 0.003  ops/us
VectorUtilBenchmark.binaryHalfByteDotProductBothPackedVector      1024  thrpt   15  11.498 ± 0.286  ops/us
VectorUtilBenchmark.binaryHalfByteDotProductScalar                1024  thrpt   15   2.376 ± 0.002  ops/us
VectorUtilBenchmark.binaryHalfByteDotProductSinglePackedScalar    1024  thrpt   15   2.449 ± 0.007  ops/us
VectorUtilBenchmark.binaryHalfByteDotProductSinglePackedVector    1024  thrpt   15   2.627 ± 0.009  ops/us
VectorUtilBenchmark.binaryHalfByteDotProductVector                1024  thrpt   15  20.785 ± 0.009  ops/us
VectorUtilBenchmark.binaryHalfByteSquareBothPackedScalar          1024  thrpt   15   1.696 ± 0.001  ops/us
VectorUtilBenchmark.binaryHalfByteSquareBothPackedVector          1024  thrpt   15  12.562 ± 0.023  ops/us
VectorUtilBenchmark.binaryHalfByteSquareScalar                    1024  thrpt   15   2.474 ± 0.010  ops/us
VectorUtilBenchmark.binaryHalfByteSquareSinglePackedScalar        1024  thrpt   15   2.021 ± 0.006  ops/us
VectorUtilBenchmark.binaryHalfByteSquareSinglePackedVector        1024  thrpt   15   2.609 ± 0.015  ops/us
VectorUtilBenchmark.binaryHalfByteSquareVector                    1024  thrpt   15  18.487 ± 0.075  ops/us
VectorUtilBenchmark.binarySquareScalar                            1024  thrpt   15   2.413 ± 0.021  ops/us
VectorUtilBenchmark.binarySquareUint8Scalar                       1024  thrpt   15   2.420 ± 0.017  ops/us
VectorUtilBenchmark.binarySquareUint8Vector                       1024  thrpt   15   6.709 ± 0.002  ops/us
VectorUtilBenchmark.binarySquareVector                            1024  thrpt   15   6.709 ± 0.002  ops/us
VectorUtilBenchmark.floatCosineScalar                             1024  thrpt   15   1.415 ± 0.002  ops/us
VectorUtilBenchmark.floatCosineVector                             1024  thrpt   75   8.646 ± 0.080  ops/us
VectorUtilBenchmark.floatDotProductScalar                         1024  thrpt   15   3.733 ± 0.003  ops/us
VectorUtilBenchmark.floatDotProductVector                         1024  thrpt   75  12.249 ± 0.046  ops/us
VectorUtilBenchmark.floatSquareScalar                             1024  thrpt   15   3.171 ± 0.008  ops/us
VectorUtilBenchmark.floatSquareVector                             1024  thrpt   75  12.483 ± 0.104  ops/us
VectorUtilBenchmark.l2Normalize                                   1024  thrpt   15   3.017 ± 0.002  ops/us
VectorUtilBenchmark.l2NormalizeVector                             1024  thrpt   75  12.207 ± 0.764  ops/us

kaivalnp · 2025-10-06T15:11:24Z

Ran some luceneutil benchmarks on Cohere vectors, 768d for various vector similarities x quantization bits:

`dot_product`

main

recall  latency(ms)  netCPU  avgCpuCount    nDoc  topK  fanout  maxConn  beamWidth  quantized  visited  index(s)  index_docs/s  force_merge(s)  num_segments  index_size(MB)  vec_disk(MB)  vec_RAM(MB)  indexType
 0.641        0.675   0.666        0.987  200000   100      50       32        250     1 bits     5101     10.74      18627.18           20.85             1          624.45       606.918       20.981       HNSW
 0.878        1.170   1.161        0.992  200000   100      50       32        250     4 bits     4662     12.20      16398.82           23.07             1          678.09       662.231       76.294       HNSW
 0.915        1.517   1.505        0.992  200000   100      50       32        250     7 bits     4605     12.58      15896.99           31.01             1          751.27       735.474      149.536       HNSW
 0.915        1.523   1.515        0.995  200000   100      50       32        250     8 bits     4570     11.64      17180.65           18.18             1          751.17       735.474      149.536       HNSW

This PR

recall  latency(ms)  netCPU  avgCpuCount    nDoc  topK  fanout  maxConn  beamWidth  quantized  visited  index(s)  index_docs/s  force_merge(s)  num_segments  index_size(MB)  vec_disk(MB)  vec_RAM(MB)  indexType
 0.641        0.678   0.668        0.985  200000   100      50       32        250     1 bits     5064     10.83      18467.22           21.32             1          624.43       606.918       20.981       HNSW
 0.876        1.140   1.131        0.992  200000   100      50       32        250     4 bits     4660     11.67      17132.09           23.35             1          678.10       662.231       76.294       HNSW
 0.914        1.514   1.504        0.993  200000   100      50       32        250     7 bits     4575     12.34      16208.77           18.19             1          751.21       735.474      149.536       HNSW
 0.916        1.576   1.566        0.994  200000   100      50       32        250     8 bits     4580     12.32      16229.81           18.29             1          751.23       735.474      149.536       HNSW

`mip`

main

recall  latency(ms)  netCPU  avgCpuCount    nDoc  topK  fanout  maxConn  beamWidth  quantized  visited  index(s)  index_docs/s  force_merge(s)  num_segments  index_size(MB)  vec_disk(MB)  vec_RAM(MB)  indexType
 0.640        0.754   0.745        0.988  200000   100      50       32        250     1 bits     5076     11.12      17987.23           20.55             1          624.43       606.918       20.981       HNSW
 0.877        1.174   1.165        0.992  200000   100      50       32        250     4 bits     4645     11.95      16737.80           24.10             1          678.11       662.231       76.294       HNSW
 0.912        1.566   1.557        0.994  200000   100      50       32        250     7 bits     4573     11.96      16723.81           18.21             1          751.21       735.474      149.536       HNSW
 0.916        1.509   1.500        0.994  200000   100      50       32        250     8 bits     4578     12.18      16416.32           18.29             1          751.19       735.474      149.536       HNSW

This PR

recall  latency(ms)  netCPU  avgCpuCount    nDoc  topK  fanout  maxConn  beamWidth  quantized  visited  index(s)  index_docs/s  force_merge(s)  num_segments  index_size(MB)  vec_disk(MB)  vec_RAM(MB)  indexType
 0.641        0.709   0.700        0.987  200000   100      50       32        250     1 bits     5080     11.68      17120.36           20.85             1          624.44       606.918       20.981       HNSW
 0.877        1.191   1.182        0.992  200000   100      50       32        250     4 bits     4654     11.61      17232.47           22.12             1          678.11       662.231       76.294       HNSW
 0.914        1.527   1.518        0.994  200000   100      50       32        250     7 bits     4585     12.27      16306.56           18.17             1          751.22       735.474      149.536       HNSW
 0.915        1.541   1.532        0.994  200000   100      50       32        250     8 bits     4582     11.70      17091.10           18.30             1          751.22       735.474      149.536       HNSW

`euclidean`

main

recall  latency(ms)  netCPU  avgCpuCount    nDoc  topK  fanout  maxConn  beamWidth  quantized  visited  index(s)  index_docs/s  force_merge(s)  num_segments  index_size(MB)  vec_disk(MB)  vec_RAM(MB)  indexType
 0.691        0.625   0.615        0.984  200000   100      50       32        250     1 bits     4723      9.64      20751.19           17.36             1          615.12       606.918       20.981       HNSW
 0.906        0.993   0.979        0.986  200000   100      50       32        250     4 bits     4413     10.70      18698.58           21.10             1          669.73       662.231       76.294       HNSW
 0.948        1.361   1.353        0.994  200000   100      50       32        250     7 bits     4389     12.22      16369.29           25.86             1          743.24       735.474      149.536       HNSW
 0.950        1.335   1.326        0.993  200000   100      50       32        250     8 bits     4387     11.31      17691.29           25.83             1          743.26       735.474      149.536       HNSW

This PR

recall  latency(ms)  netCPU  avgCpuCount    nDoc  topK  fanout  maxConn  beamWidth  quantized  visited  index(s)  index_docs/s  force_merge(s)  num_segments  index_size(MB)  vec_disk(MB)  vec_RAM(MB)  indexType
 0.692        0.628   0.618        0.984  200000   100      50       32        250     1 bits     4741     10.19      19627.09           17.71             1          615.11       606.918       20.981       HNSW
 0.905        0.987   0.977        0.990  200000   100      50       32        250     4 bits     4416     10.46      19118.63           20.92             1          669.72       662.231       76.294       HNSW
 0.949        1.396   1.387        0.994  200000   100      50       32        250     7 bits     4395     12.06      16579.62           25.65             1          743.22       735.474      149.536       HNSW
 0.951        1.332   1.316        0.988  200000   100      50       32        250     8 bits     4382     12.03      16629.25           25.74             1          743.24       735.474      149.536       HNSW

`cosine`

main

recall  latency(ms)  netCPU  avgCpuCount    nDoc  topK  fanout  maxConn  beamWidth  quantized  visited  index(s)  index_docs/s  force_merge(s)  num_segments  index_size(MB)  vec_disk(MB)  vec_RAM(MB)  indexType
 0.656        0.641   0.632        0.986  200000   100      50       32        250     1 bits     4996     10.17      19663.75           17.60             1          616.88       606.918       20.981       HNSW
 0.889        1.078   1.069        0.992  200000   100      50       32        250     4 bits     4603     10.64      18793.46           23.01             1          671.76       662.231       76.294       HNSW
 0.944        1.438   1.429        0.994  200000   100      50       32        250     7 bits     4537     12.14      16477.18           27.64             1          745.81       735.474      149.536       HNSW
 0.948        1.459   1.450        0.994  200000   100      50       32        250     8 bits     4524     11.83      16913.32           27.53             1          745.93       735.474      149.536       HNSW

This PR

recall  latency(ms)  netCPU  avgCpuCount    nDoc  topK  fanout  maxConn  beamWidth  quantized  visited  index(s)  index_docs/s  force_merge(s)  num_segments  index_size(MB)  vec_disk(MB)  vec_RAM(MB)  indexType
 0.657        0.644   0.635        0.986  200000   100      50       32        250     1 bits     5006     10.30      19411.82           17.96             1          616.85       606.918       20.981       HNSW
 0.888        0.994   0.985        0.991  200000   100      50       32        250     4 bits     4565     11.39      17556.18           22.29             1          671.74       662.231       76.294       HNSW
 0.945        1.422   1.413        0.994  200000   100      50       32        250     7 bits     4522     11.72      17064.85           27.42             1          745.81       735.474      149.536       HNSW
 0.948        1.442   1.433        0.994  200000   100      50       32        250     8 bits     4514     11.94      16746.21           26.94             1          745.94       735.474      149.536       HNSW

Except for one outlier (dot_product, main, force_merge(s)), all values appear to be within ~5% of each other

# Conflicts: # lucene/core/src/java25/org/apache/lucene/internal/vectorization/VectorizedVectorUtilSupport.java

uschindler · 2025-10-12T00:59:26Z

...ore/src/java25/org/apache/lucene/internal/vectorization/VectorizedVectorizationProvider.java


 /** A vectorization provider that leverages the Panama Vector API. */
-final class PanamaVectorizationProvider extends VectorizationProvider {
+final class VectorizedVectorizationProvider extends VectorizationProvider {


Can we just keep the class name the same? The Panama name is correct here. Please don't change it.

Same for the other classes. Everything which uses incubatoing APIs should keep its name with "Panama" (as it is called "Panama Vectorization" in the JEP).

Sure, I don't have strong opinions on this. Changed back to the original Panama* names :)

uschindler · 2025-10-12T01:00:59Z

I am not able to do any close review here, so please don't merge this now.

mikemccand · 2025-10-13T14:39:09Z

This PR

Maybe we could enhance Lucene's jmh infra so it can compare baseline/candidate runs somehow? It's hard for human eyes + brain to scan all those numbers and confirm there's no real difference... maybe open spinoff issue?

Edit: heh, and some comment about luceneutil's knnPerfTest.py? That tool has really flowered over time (and is now run in nightly benchmarks too) for testing all the many KNN options Lucene offers...

kaivalnp · 2025-10-13T17:25:34Z

It's hard for human eyes + brain to scan all those numbers and confirm there's no real difference

Haha true :)
I fed the raw data to an LLM and asked it to report percentage differences:

Benchmark	Baseline Score (ops/μs)	Candidate Score (ops/μs)	% Difference
floatCosineVector	8.913	8.646	-3.00%
floatDotProductVector	12.561	12.249	-2.48%
binaryHalfByteDotProductBothPackedVector	11.652	11.498	-1.32%
l2NormalizeVector	12.349	12.207	-1.15%
binaryDotProductUint8Scalar	2.307	2.288	-0.82%
binarySquareScalar	2.431	2.413	-0.74%
binaryHalfByteSquareBothPackedVector	12.614	12.562	-0.41%
floatSquareScalar	3.181	3.171	-0.31%
floatCosineScalar	1.419	1.415	-0.28%
binaryHalfByteSquareVector	18.526	18.487	-0.21%
binaryHalfByteDotProductScalar	2.378	2.376	-0.08%
binarySquareUint8Scalar	2.422	2.420	-0.08%
binaryHalfByteSquareSinglePackedScalar	2.022	2.021	-0.05%
floatDotProductScalar	3.734	3.733	-0.03%
binaryDotProductVector	8.040	8.039	-0.01%
binarySquareVector	6.710	6.709	-0.01%
binaryCosineScalar	0.841	0.841	0.00%
binaryDotProductUint8Vector	8.040	8.040	0.00%
binaryHalfByteDotProductSinglePackedVector	2.627	2.627	0.00%
binarySquareUint8Vector	6.709	6.709	0.00%
l2Normalize	3.016	3.017	0.03%
binaryHalfByteDotProductSinglePackedScalar	2.446	2.449	0.12%
binaryHalfByteDotProductBothPackedScalar	2.368	2.376	0.34%
binaryHalfByteSquareScalar	2.465	2.474	0.36%
binaryDotProductScalar	2.289	2.298	0.39%
binaryHalfByteDotProductVector	20.677	20.785	0.52%
binaryHalfByteSquareSinglePackedVector	2.590	2.609	0.73%
floatSquareVector	12.370	12.483	0.91%
binaryCosineVector	4.778	4.860	1.72%
binaryHalfByteSquareBothPackedScalar	1.642	1.696	3.29%

Side note: I found this cool visualizer (https://jmh.morethan.io), which takes the JSON output of JMH (add -rf json to the command line), and can compare multiple runs too!

For example, I re-ran a subset of functions and recorded their output in https://gist.github.com/kaivalnp/0424bd84326aebdecd10f8144fb46c73
Now we can visualize the results at: https://jmh.morethan.io/?gist=0424bd84326aebdecd10f8144fb46c73

Also found this GH action that automatically runs and compares JMH output: https://github.com/benchmark-action/github-action-benchmark, might be interesting to add to Lucene!

Separate Panama and Vector classes

2374f75

github-actions bot added module:core/index module:core/codecs module:sandbox module:misc labels Oct 2, 2025

iter

fa04e7b

- Refactor internal classes - Add mising javadocs - Remove unused functions

Merge branch 'main' into vector-refactor

7a66af1

# Conflicts: # lucene/core/src/java25/org/apache/lucene/internal/vectorization/VectorizedVectorUtilSupport.java

kaivalnp marked this pull request as ready for review October 8, 2025 13:28

uschindler requested changes Oct 12, 2025

View reviewed changes

uschindler marked this pull request as draft October 12, 2025 01:03

Kaival Parikh added 2 commits October 12, 2025 01:31

Change names back to Panama*

63fe8dd

Merge branch 'main' into vector-refactor

38339cf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Separate Panama and Vector classes #15285

Separate Panama and Vector classes #15285

Uh oh!

kaivalnp commented Oct 2, 2025

Uh oh!

kaivalnp commented Oct 6, 2025

Uh oh!

kaivalnp commented Oct 6, 2025

Uh oh!

uschindler Oct 12, 2025

Uh oh!

uschindler Oct 12, 2025

Uh oh!

kaivalnp Oct 12, 2025

Uh oh!

uschindler commented Oct 12, 2025

Uh oh!

mikemccand commented Oct 13, 2025 •

edited

Loading

Uh oh!

kaivalnp commented Oct 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Separate Panama and Vector classes #15285

Are you sure you want to change the base?

Separate Panama and Vector classes #15285

Uh oh!

Conversation

kaivalnp commented Oct 2, 2025

Uh oh!

kaivalnp commented Oct 6, 2025

Uh oh!

kaivalnp commented Oct 6, 2025

dot_product

mip

euclidean

cosine

Uh oh!

uschindler Oct 12, 2025

Choose a reason for hiding this comment

Uh oh!

uschindler Oct 12, 2025

Choose a reason for hiding this comment

Uh oh!

kaivalnp Oct 12, 2025

Choose a reason for hiding this comment

Uh oh!

uschindler commented Oct 12, 2025

Uh oh!

mikemccand commented Oct 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kaivalnp commented Oct 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

`dot_product`

`mip`

`euclidean`

`cosine`

mikemccand commented Oct 13, 2025 •

edited

Loading