Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion collector/README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Rust Compiler Performance Benchmarking and Profiling

Hardware and software details of the machine that executes the CI details can be found
[here](../docs/perf-runner.md). A glossary of relevant terms can be found
[here](../docs/deployment.md). A glossary of relevant terms can be found
[here](../docs/glossary.md).

## The benchmarks
Expand Down Expand Up @@ -34,6 +34,8 @@ This crate is only compatible with OpenSSL 1.0.1, 1.0.2, and 1.1.0, or LibreSSL
aborting due to this version mismatch.
```

For benchmarking using `perf`, you will also need to set `/proc/sys/kernel/perf_event_paranoid` to `-1`.

## Benchmarking

This section is about benchmarking rustc, i.e. measuring its performance on the
Expand Down
22 changes: 7 additions & 15 deletions database/schema.md
Original file line number Diff line number Diff line change
Expand Up @@ -238,17 +238,9 @@ Columns:
* **job_id** (`INTEGER`): A nullable job_id which, if it exists it will inform
us as to which job this error is part of.

## New benchmarking design
We are currently implementing a new design for dispatching benchmarks to collector(s) and storing
them in the database. It will support new use-cases, like backfilling of new benchmarks into a parent
commit and primarily benchmarking with multiple collectors (and multiple hardware architectures) in
parallel.

The tables below are a part of the new scheme.

### benchmark_request

Represents a single request for performing a benchmark collection. Each request can be one of three types:
Represents a single request for performing a benchmark run. Each request can be one of three types:

* Master: benchmark a merged master commit
* Release: benchmark a published stable or beta compiler toolchain
Expand Down Expand Up @@ -297,15 +289,13 @@ Columns:

### job_queue

This table stores ephemeral benchmark jobs, which specifically tell the
collector which benchmarks it should execute. The jobs will be kept in the
table for ~30 days after being completed, so that we can quickly figure out
what master parent jobs we need to backfill when handling try builds.
This table stores benchmark jobs, which specifically tell the
collector which benchmarks it should execute.

Columns:

* **id** (`bigint` / `serial`): Primary*key identifier for the job row;
auto*increments with each new job.
* **id** (`bigint` / `serial`): Primary key identifier for the job row;
autoincrements with each new job.
* **request_tag** (`text`): References the parent benchmark request that
spawned this job.
* **target** (`text NOT NULL`): Hardware/ISA the benchmarks must run on
Expand All @@ -325,3 +315,5 @@ Columns:
`success`, or `failure`.
* **retry** (`int NOT NULL`): Number of times the job has been re*queued after
a failure; 0 on the first attempt.
* **kind** (`text NOT NULL`): What benchmark suite should be executed in the job (`compiletime`, `runtime` or `rustc`).
* **is_optional** (`boolean NOT NULL`): Whether a request should wait for this job to finish before it will become completed.
7 changes: 7 additions & 0 deletions docs/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# rustc-perf documentation

- [Glossary of useful terms](./glossary.md)
- [Database schema](../database/schema.md)
- [How rustc-perf is deployed](./deployment.md)
- [How the distributed job queue works](./job-queue.md)
- [How we compare benchmarks results](./comparison-analysis.md)
20 changes: 18 additions & 2 deletions docs/perf-runner.md → docs/deployment.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,23 @@
# Benchmarking machine
The machine that actually executes the benchmarks is the `AX-42` server running on [Hetzner](https://www.hetzner.com/dedicated-rootserver/). It has the following configuration.
# Deployment

The machines that actually execute the benchmarks ("collectors") are dedicated machines running on [Hetzner](https://www.hetzner.com/dedicated-rootserver/). The [web server](http://perf.rust-lang.org/) runs on [ECS](https://github.com/rust-lang/infra-team/blob/HEAD/service-catalog/rustc-perf/README.md).

## Debugging
This section documents what to do in case benchmarking doesn't work or something is stuck. The status of the collectors can be found on the [status page](https://perf.rust-lang.org/status.html). In particular, it shows the last heartbeat of each collector. If that date is very old (>1 hour), then something bad has probably happened with the collector.

You can SSH into the machines directly and examine what is going on there. The currently active machines have the following domain names:

- `rustc-perf-one.infra.rust-lang.org`
- `rustc-perf-two.infra.rust-lang.org`

The benchmarking process runs as a systemd service called `collector`. You can start/stop/inspect it using the usual commands:
- Start/restart/stop: `sudo systemctl start/restart/stop collector.service`
- See logs: `sudo journalctl --utc -n 10000 -u collector -f`

The user account under which the benchmarks execute is called `collector`, you can switch to it using `su` and examine the `/home/collector/rustc-perf` checkout, from where are the benchmarks executed.

## Hardware
- The collectors run on `AX-42` Hetzner server instances.
- 8-core AMD Ryzen 7 PRO 8700GE with HyperThreading (16 hardware threads total)
<details>
<summary>Output of `lscpu`</summary>
Expand Down
25 changes: 19 additions & 6 deletions docs/glossary.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,9 @@ The following is a glossary of domain specific terminology. Although benchmarks
- `incr-patched`: incremental compilation is used, with a full incremental cache and some code changes made.
* **backend**: the codegen backend used for compiling Rust code.
- `llvm`: the default codegen backend
- `cranelift`: experimental backend designed for quicker non-optimized builds
* **target**: compilation target for which the benchmark is compiled.
- `x86_64-unknown-linux-gnu`: the default x64 Linux target
* **category**: a high-level group of benchmarks. Currently, there are three categories, primary (mostly real-world crates), secondary (mostly stress tests), and stable (old real-world crates, only used for the dashboard).
* **artifact type**: describes what kind of artifact does the benchmark build. Either `library` or `binary`.

Expand All @@ -41,15 +44,15 @@ The following is a glossary of domain specific terminology. Although benchmarks
## Testing

* **test case**: a combination of parameters that describe the measurement of a single (compile-time or runtime) benchmark - a single `test`
- For compile-time benchmarks, it is a combination of a benchmark, a profile, and a scenario.
- For runtime benchmarks, it is currently only the benchmark name.
- For compile-time benchmarks, it is a combination of a benchmark, a profile, a scenario, a codegen backend and a target.
- For runtime benchmarks, it a combination of a benchmark and a target.
* **test**: the act of running an artifact under a test case. Each test is composed of many iterations.
* **test iteration**: a single iteration that makes up a test. Note: we currently normally run 3 test iterations for each test.
* **test result**: the result of the collection of all statistics from running a test. Currently, the minimum value of a statistic from all the test iterations is used for analysis calculations and the website.
* **statistic**: a single measured value of a metric in a test result
* **test result**: the set of all gathered statistics from running a test. Currently, the minimum value of a statistic from all the test iterations is used for analysis calculations and the website.
* **statistic**: a single measured value of a metric in a test iteration
* **statistic description**: the combination of a metric and a test case which describes a statistic.
* **statistic series**: statistics for the same statistic description over time.
* **run**: a set of tests for all currently available test cases measured on a given artifact.
* **run**: a set of tests for all currently available test cases measured on a given artifact.

## Analysis

Expand All @@ -60,7 +63,17 @@ The following is a glossary of domain specific terminology. Although benchmarks
* **relevant test result comparison**: a test result comparison can be significant but still not be relevant (i.e., worth paying attention to). Relevance is a factor of the test result comparison's significance and magnitude. Comparisons are considered relevant if they are significant and have at least a small magnitude .
* **test result comparison magnitude**: how "large" the delta is between the two test result's under comparison. This is determined by the average of two factors: the absolute size of the change (i.e., a change of 5% is larger than a change of 1%) and the amount above the significance threshold (i.e., a change that is 5x the significance threshold is larger than a change 1.5x the significance threshold).

## Other
## Job queue

These terms are related to the [job queue system](./job-queue.md) that distributes benchmarking jobs across available collectors.

- **benchmark request**: a request for a benchmarking a *run* on a given *artifact*. Can be either created from a try build on a PR, or it is automatically created from merged master/release *artifacts*.
- **collector**: a machine that performs benchmarks.
- **benchmark set**: a subset of a compile/runtime/bootstrap benchmark suite that is executed by a collector in a single job.
- **job**: a high-level "work item" that defines a set of *test cases* that should be benchmarked on a specific collector.
- **job queue**: a queue of *jobs*.

## Other

* **bootstrap**: the process of building the compiler from a previous version of the compiler
* **compiler query**: a query used inside the [compiler query system](https://rustc-dev-guide.rust-lang.org/overview.html#queries).
Loading
Loading