Skip to content

Commit 0fdb60f

Browse files
author
Hendrik van Antwerpen
committed
Update benchmark text
1 parent 215b41b commit 0fdb60f

File tree

5 files changed

+119
-77
lines changed

5 files changed

+119
-77
lines changed

crates/bpe/README.md

Lines changed: 64 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -183,40 +183,82 @@ On average it is about ~4 faster, since the short-cuts usually pay off.
183183

184184
## Benchmarks
185185

186-
We compared our implementations with the tiktoken implementation on a MacBook Pro on a random input sequence:
187-
188-
| Algorithm | Runtime | correct BPE output |
189-
| ------------ | -------- | ---------- |
190-
| Greedy | 100 µs ||
191-
| Minimal | 300 µs ||
192-
| Backtracking | 400 µs ||
193-
| Dynamic Programming | 1300 µs ||
194-
| TikToken | 1500 µs ||
195-
| Heap | 1900 µs ||
196-
197-
As can be seen, our Backtracking implementation beats the TikToken Rust implementation by ~4x.
198-
And even the fully dynamic programming solution is faster with a more consistent runtime.
199-
The tuned heap implementation is still quite competitive to TikToken (especially for smaller inputs).
200-
If the requirement of correct BPE output can be relaxed, then the Greedy approach or the minimal encoding approach are the clear winners.
186+
We ran several benchmarks to compare performance between different encoders and with the tiktoken library:
201187

202-
### Counting results
188+
- The first measuers encoding runtime for our different encoders and the tiktoken Rust implementation.
189+
This shows a ~3.5x performance increase for our fastest correct encoder comapred to the tiktoken library.
203190

204-
Results for counting o200k tokens for random 10000 byte slices. The setup time of the interval encoder is comparable to backtracking. After setup counting of slices of the original data are approximately constant time.
191+
- The second measures incremental encoding runtime, where the text is built up byte-by-byte.
192+
This mode is not available in tiktoken, which only supports counting/encoding a complete text.
205193

206-
![counting runtime comparison](./benches/result/counting-o200k.svg)
194+
- The third measures interval counting runtime, where the token count for slices of an original text are determined.
195+
After the initial tokenization of the text, token counting for slices is typically constant time.
196+
This mode is not available in tiktoken, which only supports counting/encoding a complete text.
197+
198+
All benchmarks were run on a MacBook Pro M1.
199+
200+
### Encoding
201+
202+
Encoding is computing the tokens for a given text.
203+
This benchmark uses several encoders:
207204

208-
### Encoding results
205+
- The backtracking encoder uses a backtracking algorithm based on a string matching automaton.
206+
- The heap encoder uses a priority heap to implement the traditional BPE algorithm.
207+
- The table encoder uses a dynamic programming algorithm.
209208

210-
Results for encoding o200k tokens for random 1000 bytes. The backtracking encoder consistently outperforms tiktoken by a constant factor.
209+
Two additional encoders are included that are faster but do not always give exact results:
210+
211+
- The greedy encoder uses a left-to-right greedy algorithm.
212+
- The minimal encoder computes an encoding with the minimal number of tokens.
213+
214+
The benchmark measured the runtime of encoding of slices of lengths 10, 100, 1000, and 1000 from a random 20000 token original text using the o200k token set.
215+
(All encodings were computed from scratch for each slice.)
216+
217+
The graph below shows encoding runtime vs slice length.
218+
All encoders show similar runtime increases with increasing slice length.
219+
The backtracking encoder, the fastest encoder that still returns correct results, shows a performance gain of approximately 3.5x compared to tiktoken.
220+
The fully dynamic programming solution and the heap implementation are still quite competitive to TikToken (especially for smaller inputs).
221+
If the requirement of correct BPE output can be relaxed, then the Greedy approach or the minimal encoding approach are the clear winners.
211222

212223
![encoding runtime comparison](./benches/result/encoding-o200k.svg)
213224

214-
### Incremental encoding results
225+
### Incremental encoding
226+
227+
Incremental encoding tokenizes a text to which bytes are appended.
228+
This benchmark uses two encoders:
215229

216-
Results for incrementally encoding o200k tokens by appending 10000 random bytes. The appending encoder is slower by a constant factor but overall has similar performance curve as the backtracking encoder encoding all data at once.
230+
- The backtracking encoder, which retokenizes the text froms cratch every time it changes.
231+
- The appending encoder, which supports incremental encoding when bytes are added.
232+
233+
The benchmark measured the runtime of encoding of slices of lengths 10, 100, 1000, and 1000 from a random 20000 token original using the o200k token set.
234+
The backtracking encoder encoded the final text in one go.
235+
The appending encoder got the text bytes on by one.
236+
237+
The graph below shows encoding runtime vs slice length.
238+
Runtime of both encoders grows similarly with slice length.
239+
The incremental encoder shows a constant factor overhead.
240+
Note that this is still a huge win for incremental use cases, which would otherwise require retokenization after each append, resulting in a quadratic slowdown.
217241

218242
![appending runtime comparison](./benches/result/appending-o200k.svg)
219243

244+
### Interval counting
245+
246+
Interval counting is counting the tokens for a slice of an original text.
247+
This benchmark uses two encoders:
248+
249+
- The backtracking encoder encodes the slice from scratch.
250+
This is similar to what one has to do with other libraries, like `tiktoken`.
251+
- The interval encoder encodes the original text once and reuses that encoding to count tokens for intervals of the original text.
252+
The initial encoding time for the interval encoder is comparable to that of the backtracking encoder.
253+
254+
The benchmark measured the runtime of counting o200k tokens on slices of lengths 10, 100, 1000, and 1000 from a random 20000 token original text.
255+
256+
The graph below shows counting runtime vs slice length.
257+
The runtime of the backtracking encoder grows with the length of the slice.
258+
The interval encoder counts any interval in typically constant time.
259+
260+
![counting runtime comparison](./benches/result/counting-o200k.svg)
261+
220262
### Running the benchmarks
221263

222264
Run the benchmark as follows (required [cargo-criterion](https://crates.io/crates/cargo-criterion) installed):

crates/bpe/benches/performance.rs

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -28,8 +28,8 @@ static TOKENIZERS: LazyLock<[(&'static str, &'static BytePairEncoding, CoreBPE);
2828

2929
fn counting_benchmark(c: &mut Criterion) {
3030
for (name, bpe, _) in TOKENIZERS.iter() {
31-
let input = create_test_bytes(&bpe, 20000);
32-
let fast = IntervalEncoding::new(&bpe, &input);
31+
let input = create_test_bytes(bpe, 20000);
32+
let fast = IntervalEncoding::new(bpe, &input);
3333

3434
let mut group = c.benchmark_group(format!("counting-{name}"));
3535
group.plot_config(PlotConfiguration::default().summary_scale(AxisScale::Logarithmic));
@@ -60,7 +60,7 @@ fn counting_benchmark(c: &mut Criterion) {
6060

6161
fn encoding_benchmark(c: &mut Criterion) {
6262
for (name, bpe, tiktoken) in TOKENIZERS.iter() {
63-
let text = create_test_string(&bpe, 20000);
63+
let text = create_test_string(bpe, 20000);
6464
let input = text.as_bytes();
6565

6666
let mut group = c.benchmark_group(format!("encoding-{name}"));
@@ -126,7 +126,7 @@ fn encoding_benchmark(c: &mut Criterion) {
126126

127127
fn appending_benchmark(c: &mut Criterion) {
128128
for (name, bpe, _) in TOKENIZERS.iter() {
129-
let input = create_test_bytes(&bpe, 20000);
129+
let input = create_test_bytes(bpe, 20000);
130130

131131
let mut group = c.benchmark_group(format!("appending-{name}"));
132132
group.plot_config(PlotConfiguration::default().summary_scale(AxisScale::Logarithmic));
@@ -140,7 +140,7 @@ fn appending_benchmark(c: &mut Criterion) {
140140
AppendableEncoder::new(bpe),
141141
)
142142
},
143-
|(start, mut enc)| enc.extend(input[start..start + bytes].into_iter().copied()),
143+
|(start, mut enc)| enc.extend(input[start..start + bytes].iter().copied()),
144144
criterion::BatchSize::SmallInput,
145145
)
146146
});

0 commit comments

Comments
 (0)