-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Add PGO documentation section to crate configuration #18959
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@alamb |
Add section explaining Profile Guided Optimization can provide up to 25% performance improvements. Includes build process instructions and tips for effective PGO usage. References issue apache#9507.
3c3329b to
ee7d0bb
Compare
|
Thank you, this is great. I'm not familiar with this topic, so I'm adding a question that I think would be helpful to address: I remember some previous discussions where we decided not to use PGO for release binaries, but I can’t recall the reason. I think @alamb knows it. |
|
@2010YOUY01 |
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for these instructions @jatinkumarsingh 🙏
i tried them out locally for datafusion-cli:
RUSTFLAGS="-C profile-generate=/tmp/pgo-data" cargo build --release --bin datafusion-cli
then run clickbench
cd datafusion/benchmarks/data
ln -s hits_partitioned hits
../../target/release/datafusion-cli -f ../queries/clickbench/queries/*.sqlThen recompile:
RUSTFLAGS="-C profile-use=/tmp/pgo-data" cargo build --release --bin datafusion-cliANd it worked well
|
@alamb |
|
Thanks again @jatinkumarsingh and @2010YOUY01 |
It's an interesting note! @alamb could you please give a bit more details about this one? I am curious why PGO is not used for Datafusion's release binaries too. Thank you! |
Well one reason is that it is not clear what workload we would use. We could tune it to ClickBench, for example, but I am not sure how well that matches people's actual workloads I think in general PGO is used when you have a very specific workload you want to tune for |
|
Yeah, that's what I was expecting. Thanks! |
Add section explaining Profile Guided Optimization can provide up to 25% performance improvements. Includes three-stage build process instructions and tips for effective PGO usage. References issue #9507.
Which issue does this PR close?
Closes #9561
Rationale for this change
Adds documentation for Profile Guided Optimization (PGO) as requested. PGO can provide up to 25% performance improvements for DataFusion workloads, and users need clear guidance on how to use it.
What changes are included in this PR?
docs/source/user-guide/crate-configuration.mdAre these changes tested?
Yes. Documentation changes are validated by the CI workflow which builds the docs and checks for errors. The markdown syntax is valid and follows existing patterns.
Are there any user-facing changes?
Yes. This adds documentation that will be published on the DataFusion website under "Crate Configuration" > "Optimizing Builds". Users will find guidance on using PGO to improve performance.