Skip to content

Commit 3c3329b

Browse files
Add PGO documentation section to crate configuration
Add section explaining Profile Guided Optimization can provide up to 25% performance improvements. Includes three-stage build process instructions and tips for effective PGO usage. References issue #9507.
1 parent 769f367 commit 3c3329b

File tree

1 file changed

+32
-0
lines changed

1 file changed

+32
-0
lines changed

docs/source/user-guide/crate-configuration.md

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -92,6 +92,38 @@ lto = true
9292
codegen-units = 1
9393
```
9494

95+
### Profile Guided Optimization (PGO)
96+
97+
Profile Guided Optimization (PGO) can provide substantial performance improvements of up to 25% for DataFusion workloads. PGO works by compiling your code with instrumentation, running it on representative workloads to collect profile data, and then recompiling with optimizations based on that profile data.
98+
99+
To use PGO with DataFusion, you need to perform a three-stage build process:
100+
101+
1. **Build with instrumentation**: Compile your code with profile generation enabled.
102+
103+
```shell
104+
RUSTFLAGS="-C profile-generate=/tmp/pgo-data" cargo build --release
105+
```
106+
107+
2. **Run representative workloads**: Execute your DataFusion application with workloads that represent your typical usage patterns. For best results, use benchmarks like TPCH or Clickbench, or run your actual production workloads.
108+
109+
```shell
110+
./target/release/your-datafusion-app --benchmark
111+
```
112+
113+
3. **Build with profile data**: Recompile using the collected profile data.
114+
115+
```shell
116+
RUSTFLAGS="-C profile-use=/tmp/pgo-data" cargo build --release
117+
```
118+
119+
**Tips for effective PGO:**
120+
121+
- Use representative workloads: The profile data should reflect your actual usage patterns. Consider using TPCH, Clickbench, or your production query patterns.
122+
- Run multiple iterations: Execute your workload multiple times during the profiling stage to ensure comprehensive coverage.
123+
- Combine with other optimizations: PGO works well in combination with LTO and CPU-specific optimizations for maximum performance gains.
124+
125+
For more information on PGO with Rust, see the [Rust compiler guide on Profile Guided Optimization](https://rustc-dev-guide.rust-lang.org/building/optimized-build.html#profile-guided-optimization). See also [issue #9507](https://github.com/apache/datafusion/issues/9507) for discussion on PGO results.
126+
95127
### Alternate Allocator: `snmalloc`
96128

97129
You can also use [snmalloc-rs](https://crates.io/crates/snmalloc-rs) crate as

0 commit comments

Comments
 (0)