-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Perf: Optimize vectorized append function #16876
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Hi @alamb can you trigger the benchmark here, i can't believe the benchmark result from my local, thanks! cargo bench --bench aggregate_vectorized "/vectorized_append" critcmp --filter "/vectorized_append" optimize_vectorized_append main
group main optimize_vectorized_append
----- ---- --------------------------
ByteViewGroupValueBuilder_vectorized_append/inlined_null_0.0_size_1000/vectorized_append 6.83 5.8±0.09µs ? ?/sec 1.00 855.4±11.16ns ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/inlined_null_0.0_size_10000/vectorized_append 5.20 54.6±1.44µs ? ?/sec 1.00 10.5±0.49µs ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/inlined_null_0.0_size_100000/vectorized_append 4.13 555.2±16.42µs ? ?/sec 1.00 134.3±2.45µs ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/inlined_null_0.1_size_1000/vectorized_append 1.87 13.0±0.17µs ? ?/sec 1.00 6.9±0.19µs ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/inlined_null_0.1_size_10000/vectorized_append 1.79 135.8±25.78µs ? ?/sec 1.00 75.9±0.67µs ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/inlined_null_0.1_size_100000/vectorized_append 1.61 1317.2±57.35µs ? ?/sec 1.00 815.9±4.39µs ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/inlined_null_0.5_size_1000/vectorized_append 1.37 9.7±0.29µs ? ?/sec 1.00 7.1±0.31µs ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/inlined_null_0.5_size_10000/vectorized_append 1.25 153.5±8.49µs ? ?/sec 1.00 122.8±1.29µs ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/inlined_null_0.5_size_100000/vectorized_append 1.14 1626.3±163.43µs ? ?/sec 1.00 1430.9±52.95µs ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/mixed_null_0.0_size_1000/vectorized_append 2.65 14.3±0.22µs ? ?/sec 1.00 5.4±0.17µs ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/mixed_null_0.0_size_10000/vectorized_append 2.11 154.3±1.54µs ? ?/sec 1.00 73.0±0.83µs ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/mixed_null_0.0_size_100000/vectorized_append 4.11 3.4±0.08ms ? ?/sec 1.00 825.8±19.57µs ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/mixed_null_0.1_size_1000/vectorized_append 2.30 18.7±0.25µs ? ?/sec 1.00 8.1±0.16µs ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/mixed_null_0.1_size_10000/vectorized_append 2.30 222.3±2.05µs ? ?/sec 1.00 96.4±1.83µs ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/mixed_null_0.1_size_100000/vectorized_append 2.11 2.5±0.08ms ? ?/sec 1.00 1182.3±11.18µs ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/mixed_null_0.5_size_1000/vectorized_append 2.09 12.9±0.21µs ? ?/sec 1.00 6.1±0.29µs ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/mixed_null_0.5_size_10000/vectorized_append 1.93 202.2±6.66µs ? ?/sec 1.00 104.6±2.83µs ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/mixed_null_0.5_size_100000/vectorized_append 1.54 2.2±0.05ms ? ?/sec 1.00 1436.8±8.23µs ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/random_null_0.0_size_1000/vectorized_append 1.09 18.8±0.43µs ? ?/sec 1.00 17.3±0.89µs ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/random_null_0.0_size_10000/vectorized_append 1.14 250.9±2.93µs ? ?/sec 1.00 220.5±5.58µs ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/random_null_0.0_size_100000/vectorized_append 1.00 11.6±1.11ms ? ?/sec 1.00 11.6±1.02ms ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/random_null_0.1_size_1000/vectorized_append 1.22 21.1±0.36µs ? ?/sec 1.00 17.3±0.27µs ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/random_null_0.1_size_10000/vectorized_append 1.26 289.4±3.34µs ? ?/sec 1.00 230.1±5.77µs ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/random_null_0.1_size_100000/vectorized_append 1.09 11.2±0.29ms ? ?/sec 1.00 10.2±0.90ms ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/random_null_0.5_size_1000/vectorized_append 1.26 14.6±0.24µs ? ?/sec 1.00 11.5±0.12µs ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/random_null_0.5_size_10000/vectorized_append 1.44 257.5±4.17µs ? ?/sec 1.00 178.8±11.30µs ? ?/sec
ByteViewGroupValueBuilder_vectorized_append/random_null_0.5_size_100000/vectorized_append 1.16 7.1±0.25ms ? ?/sec 1.00 6.2±0.18ms ? ?/sec |
…rrow-datafusion into optimize_vectorized_append
🤖 |
(I also queued up the mircobenchmark) |
🤖: Benchmark completed Details
|
🤖 |
🤖: Benchmark completed Details
|
🤖 |
🤖: Benchmark completed Details
|
Thank you @alamb , it is almost the same result from my local. But the clickbench/tpch seems no change. |
Yeah, I was hoping to see some end to end performance improvements too. I'll try and look carefully at this PR over the next day or two |
Which issue does this PR close?
ByteViewGroupValueBuilder
on batches with inlined views #16330Rationale for this change
Optimize vectorized append function
What changes are included in this PR?
Optimize vectorized append function
Are these changes tested?
Are there any user-facing changes?