-
Notifications
You must be signed in to change notification settings - Fork 113
perf: reduce time_elapsed_opening overhead #5879
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: reduce time_elapsed_opening overhead #5879
Conversation
|
I think that for many modern setups, 1Mb and 64KB are pretty close in terms of latency, maybe the takeaway here is to expose that as a config for the DataFusion integration? |
Codecov Report❌ Patch coverage is
☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Seems like running benchmarks from forks is still painful, I'm working on fixing it so we can get some numbers here. |
|
FYI: 'Rust (semver checks)' was failing due to cargo-semver-checks enabling all features, which pulls in the est-harness feature. That feature exports stest_reuse #[template] #[export]-generated hashed macro_rules! names (e.g. search_sorted_conformance_), and semver-checks flags them as removed/renamed vs the baseline crate. This PR updates the semver job to use eature-group: default-features so the semver surface matches the supported public API and avoids those false positives. Because this touches .github/workflows, GitHub marks the latest CI run as action_required until a maintainer approves running workflows for this fork PR. |
|
we're working on the semver check elsewhere, it's not a required check anyway, more of a general indication |
|
Thanks! I'll turn the initial read size into a config as suggested. Really appreciate you handling the benchmark infra for forks—looking forward to your feedback tomorrow! |
.github/workflows/ci.yml
Outdated
| with: | ||
| # Avoid enabling test-only feature flags (e.g. `test-harness`) that export unstable | ||
| # procedural-macro-generated items and create false-positive semver diffs. | ||
| feature-group: default-features |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets remove this from this PR, we might refactor that whole macro or use an upstream fix
|
You can see the benchmark results in the run summary, seems to make more of a difference for DuckDB but I'll take that, I expect it to also make a difference for DataFusion or at least for that metric, I'll try it out soon. |
|
A small local test I ran showed very nice improvements to |
…egression test Signed-off-by: godnight10061 <godnight10061@users.noreply.github.com>
Signed-off-by: godnight10061 <godnight10061@users.noreply.github.com>
Signed-off-by: godnight10061 <godnight10061@users.noreply.github.com>
Signed-off-by: godnight10061 <godnight10061@users.noreply.github.com>
Signed-off-by: godnight10061 <godnight10061@users.noreply.github.com>
Signed-off-by: godnight10061 <godnight10061@users.noreply.github.com>
1dcf3ae to
4856cf1
Compare
AdamGS
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
|
There's a noticeable improvement in several of our benchmarks because of this which is nice! Though I did notice regressions in Clickbench Q6 and Q23 that someone should probably investigate? |
This PR tries to reduce
time_elapsed_openingfor Vortex scans (observed vs Parquet).Changes:
vortex-file: shrink default footer initial read from 1MiB toMAX_POSTSCRIPT_SIZE + EOF_SIZE(~64KiB) and add a regression test.vortex-scan: makeScanBuilder::into_stream()lazy (deferprepare()/ split registration until first poll) and add a unit test to ensure stream construction has no split-planning side effects.vortex-datafusion: expose the footer initial read size as a format option (footer_initial_read_size_bytes) and plumb it intoVortexOpenOptions::with_initial_read_size.Notes:
into_stream()construction.read_footerwill issue additional reads as before.Tests:
cargo +nightly fmt --all --checkcargo clippy -p vortex-datafusion --all-targets --all-features -- -D warningscargo test --locked -p vortex-file -p vortex-scan -p vortex-iocargo test --locked -p vortex-datafusionRelated: #4677