-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Add benchmark utility to profile peak memory usage #16814
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
62737d3
to
aec3191
Compare
52ffcdc
to
5d8b177
Compare
benchmarks/README.md
Outdated
# Profiling Memory Stats for each benchmark query | ||
The `mem_profile` program wraps benchmark execution to measure memory usage statistics, such as peak RSS. It runs each benchmark query in a separate subprocess, capturing the child process’s stdout to print structured output. | ||
|
||
Subcommands supported by mem_profile are the subset of those in `dfbench`. | ||
Currently supported benchmarks include: Clickbench, H2o, Imdb, SortTpch, Tpch | ||
|
||
Before running benchmarks, `mem_profile` automatically compiles the benchmark binary (`dfbench`) using `cargo build` with the same cargo profile (e.g., --release) as mem_profile itself. By prebuilding the binary and running each query in a separate process, we can ensure accurate memory statistics. | ||
|
||
Currently, `mem_profile` only supports `mimalloc` as the memory allocator, since it relies on `mimalloc`'s API to collect memory statistics. | ||
|
||
Because it runs the compiled binary directly from the target directory, make sure your working directory is the top-level datafusion/ directory, where the target/ is also located. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's more description about this utility and supported metrics.
@2010YOUY01 This is ready for review :) I would love to hear your feedback. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, this is great! Using inner metrics provided by the mi-malloc also let us able to check additional info inside mimalloc, if someone is interested in the future.
I have several suggestions
- The
--profile
option now seems only apply to themem_profile
binary, and thedfbench
binary will use the default--profile release
option, and takes a long time to compile. Is it possible to also let that configurable?
cargo run --profile release-nonlto --bin mem_profile -- tpch --query 1 --path benchmarks/data/tpch_sf1 --partitions 4 --format parquet
...
Finished `release-nonlto` profile [optimized] target(s) in 2m 17s
Running `target/release-nonlto/mem_profile tpch --query 1 --path benchmarks/data/tpch_sf1 --partitions 4 --format parquet`
...
Compiling datafusion-benchmarks v49.0.0 (/Users/yongting/Code/datafusion/benchmarks)
Finished `release` profile [optimized] target(s) in 7m 42s
-
I suggest to add one e2e test for this feature, you can use ci feature flag, then it will can be run in CI when the benchmark data is ready.
Here is a example:datafusion/benchmarks/src/imdb/run.rs
Line 480 in cbda394
#[cfg(feature = "ci")] -
Open issues to integrate that into
./bench.sh
and./bench.sh compare branch-a branch-b
utility.
Ideally, it can be run like
MEM_PROFILE=true ./bench.sh run tpch
./bench.sh compare branch-a branch-b
And the result will compare both time and memory usage side by side
|
||
Currently, `mem_profile` only supports `mimalloc` as the memory allocator, since it relies on `mimalloc`'s API to collect memory statistics. | ||
|
||
Because it runs the compiled binary directly from the target directory, make sure your working directory is the top-level datafusion/ directory, where the target/ is also located. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest to add it (should be runned under datafusion/
) to the error message
Now such panic will be triggered, if it's running under a different directory.
Benchmark binary built successfully.
thread 'main' panicked at benchmarks/src/bin/mem_profile.rs:152:10:
Failed to start benchmark: Os { code: 2, kind: NotFound, message: "No such file or directory" }
31747e7
to
796b79b
Compare
let command = format!("{target_dir}/{profile}/dfbench"); | ||
// Check whether benchmark binary exists | ||
if !Path::new(&command).exists() { | ||
panic!( | ||
"Benchmark binary not found: `{command}`\nRun this command from the top-level `datafusion/` directory so `target/{profile}/dfbench` can be found.", | ||
); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added check with err msg (reply to #16814 (comment))
796b79b
to
67cadfa
Compare
#[test] | ||
// This test checks whether `mem_profile` runs successfully and produces expected output | ||
// using TPC-H query 6 (which runs quickly). | ||
fn mem_profile_e2e_tpch_q6() -> Result<()> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
e2e test added here
let target_dir = | ||
env::var("CARGO_TARGET_DIR").unwrap_or_else(|_| "target".to_string()); | ||
let command = format!("{target_dir}/{profile}/dfbench"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also updated to respect CARGO_TARGET_DIR
to run ci test (since cargo test's working directory may not be top level directory)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks again!
Just one nit: When new benchmark is added, we have to update mem_profile.rs
to add memory profiling support, it would be better to indicate that in the error message.
cargo run --profile release-nonlto --bin mem_profile -- --bench-profile release-nonlto nlj
error: Found argument 'nlj' which wasn't expected, or isn't valid in this context
Suggested: error: 'nlj' is not supported for memory profiling. If this is a new benchmark, please update the memory profiling benchmarks to include it.
@2010YOUY01 Thank you ! I mentioned your suggestion in follow up issue, and I'll address it when I add support for nlj benchmark :) |
Personally, I prefer to use |
Another advantage of this external profiling approach is that we can compare different memory allocators side by side (e.g., However, this PR's approach allows us to inspect more internal If there's another external profiling implementation, I'm happy to review it as well. |
@UBarney Thanks for the feedback! I actually considered writing a Python script (+ time -v or reading /proc/stat~~) instead, for the same reason you mentioned. But I eventually chose this because I wanted to make sure that the utility user compiles the benchmark binary ahead of time and for the reason @2010YOUY01 mentioned. (although max rss will be our primary focus) Anyway, I agree the double compilation step is not so desirable. If you think it is too painful, I’m definitely open to alternative ideas — for example, rewriting |
From what I can see, the new utility in this PR is designed to get RSS statistics specifically from binaries using It also seems we can already compare the timing of different allocators with the bench.sh script by setting this environment variable datafusion/benchmarks/bench.sh Line 43 in 483444a
(e.g. |
Thanks again @ding-young and @UBarney |
* add benchmark utility to profile memory usage * get memory stats from mimalloc, not procfs * support more benchmarks * update benchmarks/README and refactor * fix sort-tpch output format & taplo format * add e2e test & comments * update description to major page fault --------- Co-authored-by: Yongting You <2010youy01@gmail.com>
Which issue does this PR close?
Rationale for this change
What changes are included in this PR?
In each benchmark (clickbench, tpch, sort-tpch, imdb, h2o), we now call
print_memory_stats();
to print memory usage statistics viamimalloc
. (This only works when compiled with --features mimalloc_extended.)A new utility
mem_profile
is added. It buildsdfbench
with the mimalloc_extended feature enabled, and then runs each benchmark query in a separate subprocess to collect memory stats. The utility captures the subprocess’s stdout and summarizes the results for all queries.Are these changes tested?
Yes. Via
(Updated)
and
// Set profile to release-nonlto
Are there any user-facing changes?