Commit 82dbd7e
authored
Improve upsert throughput by 3x (#334)
## Problem
Python SDK upsert throughput is low compared to other SDKs - for example
I can achive 880 vector upserts/sec with the Python SDK, compared to
3500 upserts/sec with the Java SDK.
Profiling the Python SDK performing these upserts shows a large
percentage of time in gRPC / protobuf serialisation / deserialisation.
## Solution
Upgrade protobuf from v3 to v4. This adds a number of performance
improvements in parsing / serialization as documented at
https://protobuf.dev/news/2022-05-06/#python-updates
This increases upsert() throughput by 3x (measured by upserting 1M 768
dimension indexes to a pod-based index in batches of 500):
* Before: 880 vectors/sec
* After: 2580 vectors/sec
As per the documentation, this results in an incompatible change with
the _generated_ Python code, so this depends on a related change to
pinecone-protos to change the version of protobuf used to generate the
Python code there.
## Type of Change
- [x] None of the above: Performance improvement.
## Test Plan
Use existing regression tests.1 parent e123da1 commit 82dbd7e
File tree
7 files changed
+866
-2076
lines changed- .github/workflows
- pinecone
- core/grpc/protos
- grpc
7 files changed
+866
-2076
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
56 | 56 | | |
57 | 57 | | |
58 | 58 | | |
59 | | - | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
60 | 62 | | |
61 | | - | |
62 | 63 | | |
63 | | - | |
64 | | - | |
65 | 64 | | |
66 | 65 | | |
67 | 66 | | |
| |||
92 | 91 | | |
93 | 92 | | |
94 | 93 | | |
95 | | - | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
96 | 97 | | |
97 | | - | |
98 | 98 | | |
99 | | - | |
100 | | - | |
101 | 99 | | |
102 | 100 | | |
103 | 101 | | |
| |||
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.
0 commit comments