Merge pull request #2351 from Kobzol/unify-docs

Kobzol · web-flow · commit ba8021a5b9bb · 2025-12-11T15:12:28.000Z
Update job queue documentation
diff --git a/collector/README.md b/collector/README.md
@@ -1,7 +1,7 @@
 # Rust Compiler Performance Benchmarking and Profiling
 
 Hardware and software details of the machine that executes the CI details can be found
-[here](../docs/perf-runner.md). A glossary of relevant terms can be found
+[here](../docs/deployment.md). A glossary of relevant terms can be found
 [here](../docs/glossary.md).
 
 ## The benchmarks
@@ -34,6 +34,8 @@ This crate is only compatible with OpenSSL 1.0.1, 1.0.2, and 1.1.0, or LibreSSL
 aborting due to this version mismatch.
 ```
 
+For benchmarking using `perf`, you will also need to set `/proc/sys/kernel/perf_event_paranoid` to `-1`.
+
 ## Benchmarking
 
 This section is about benchmarking rustc, i.e. measuring its performance on the
diff --git a/database/schema.md b/database/schema.md
@@ -238,17 +238,9 @@ Columns:
 * **job_id** (`INTEGER`): A nullable job_id which, if it exists it will inform
   us as to which job this error is part of.
 
-## New benchmarking design
-We are currently implementing a new design for dispatching benchmarks to collector(s) and storing
-them in the database. It will support new use-cases, like backfilling of new benchmarks into a parent
-commit and primarily benchmarking with multiple collectors (and multiple hardware architectures) in
-parallel.
-
-The tables below are a part of the new scheme.
-
 ### benchmark_request
 
-Represents a single request for performing a benchmark collection. Each request can be one of three types:
+Represents a single request for performing a benchmark run. Each request can be one of three types:
 
 * Master: benchmark a merged master commit
 * Release: benchmark a published stable or beta compiler toolchain
@@ -297,15 +289,13 @@ Columns:
 
 ### job_queue
 
-This table stores ephemeral benchmark jobs, which specifically tell the
-collector which benchmarks it should execute. The jobs will be kept in the
-table for ~30 days after being completed, so that we can quickly figure out
-what master parent jobs we need to backfill when handling try builds.
+This table stores benchmark jobs, which specifically tell the
+collector which benchmarks it should execute.
 
 Columns:
 
-* **id** (`bigint` / `serial`): Primary*key identifier for the job row;
-  auto*increments with each new job.
+* **id** (`bigint` / `serial`): Primary key identifier for the job row;
+  autoincrements with each new job.
 * **request_tag** (`text`): References the parent benchmark request that
   spawned this job.
 * **target** (`text NOT NULL`): Hardware/ISA the benchmarks must run on
@@ -325,3 +315,5 @@ Columns:
   `success`, or `failure`.
 * **retry** (`int NOT NULL`): Number of times the job has been re*queued after
   a failure; 0 on the first attempt.
+* **kind** (`text NOT NULL`): What benchmark suite should be executed in the job (`compiletime`, `runtime` or `rustc`).
+* **is_optional** (`boolean NOT NULL`): Whether a request should wait for this job to finish before it will become completed.
diff --git a/docs/README.md b/docs/README.md
@@ -0,0 +1,7 @@
+# rustc-perf documentation
+
+- [Glossary of useful terms](./glossary.md)
+- [Database schema](../database/schema.md)
+- [How rustc-perf is deployed](./deployment.md)
+- [How the distributed job queue works](./job-queue.md)
+- [How we compare benchmarks results](./comparison-analysis.md)
diff --git a/docs/deployment.md b/docs/deployment.md
@@ -1,7 +1,23 @@
-# Benchmarking machine
-The machine that actually executes the benchmarks is the `AX-42` server running on [Hetzner](https://www.hetzner.com/dedicated-rootserver/). It has the following configuration.
+# Deployment
+
+The machines that actually execute the benchmarks ("collectors") are dedicated machines running on [Hetzner](https://www.hetzner.com/dedicated-rootserver/). The [web server](http://perf.rust-lang.org/) runs on [ECS](https://github.com/rust-lang/infra-team/blob/HEAD/service-catalog/rustc-perf/README.md).
+
+## Debugging
+This section documents what to do in case benchmarking doesn't work or something is stuck. The status of the collectors can be found on the [status page](https://perf.rust-lang.org/status.html). In particular, it shows the last heartbeat of each collector. If that date is very old (>1 hour), then something bad has probably happened with the collector.
+
+You can SSH into the machines directly and examine what is going on there. The currently active machines have the following domain names:
+
+- `rustc-perf-one.infra.rust-lang.org`
+- `rustc-perf-two.infra.rust-lang.org`
+
+The benchmarking process runs as a systemd service called `collector`. You can start/stop/inspect it using the usual commands:
+- Start/restart/stop: `sudo systemctl start/restart/stop collector.service`
+- See logs: `sudo journalctl --utc -n 10000 -u collector -f`
+
+The user account under which the benchmarks execute is called `collector`, you can switch to it using `su` and examine the `/home/collector/rustc-perf` checkout, from where are the benchmarks executed.
 
 ## Hardware
+- The collectors run on `AX-42` Hetzner server instances.
 - 8-core AMD Ryzen 7 PRO 8700GE with HyperThreading (16 hardware threads total)
     <details>
     <summary>Output of `lscpu`</summary>
diff --git a/docs/glossary.md b/docs/glossary.md
@@ -25,6 +25,9 @@ The following is a glossary of domain specific terminology. Although benchmarks
   - `incr-patched`: incremental compilation is used, with a full incremental cache and some code changes made.
 * **backend**: the codegen backend used for compiling Rust code.
   - `llvm`: the default codegen backend
+  - `cranelift`: experimental backend designed for quicker non-optimized builds
+* **target**: compilation target for which the benchmark is compiled.
+  - `x86_64-unknown-linux-gnu`: the default x64 Linux target
 * **category**: a high-level group of benchmarks. Currently, there are three categories, primary (mostly real-world crates), secondary (mostly stress tests), and stable (old real-world crates, only used for the dashboard).
 * **artifact type**: describes what kind of artifact does the benchmark build. Either `library` or `binary`.
 
@@ -41,15 +44,15 @@ The following is a glossary of domain specific terminology. Although benchmarks
 ## Testing
 
 * **test case**: a combination of parameters that describe the measurement of a single (compile-time or runtime) benchmark - a single `test`
-    - For compile-time benchmarks, it is a combination of a benchmark, a profile, and a scenario.
-    - For runtime benchmarks, it is currently only the benchmark name.
+    - For compile-time benchmarks, it is a combination of a benchmark, a profile, a scenario, a codegen backend and a target.
+    - For runtime benchmarks, it a combination of a benchmark and a target.
 * **test**: the act of running an artifact under a test case. Each test is composed of many iterations.
 * **test iteration**: a single iteration that makes up a test. Note: we currently normally run 3 test iterations for each test. 
-* **test result**: the result of the collection of all statistics from running a test. Currently, the minimum value of a statistic from all the test iterations is used for analysis calculations and the website.
-* **statistic**: a single measured value of a metric in a test result
+* **test result**: the set of all gathered statistics from running a test. Currently, the minimum value of a statistic from all the test iterations is used for analysis calculations and the website.
+* **statistic**: a single measured value of a metric in a test iteration
 * **statistic description**: the combination of a metric and a test case which describes a statistic.
 * **statistic series**: statistics for the same statistic description over time.
-* **run**: a set of tests for all currently available test cases measured on a given artifact. 
+* **run**: a set of tests for all currently available test cases measured on a given artifact.
 
 ## Analysis
 
@@ -60,7 +63,17 @@ The following is a glossary of domain specific terminology. Although benchmarks
 * **relevant test result comparison**: a test result comparison can be significant but still not be relevant (i.e., worth paying attention to). Relevance is a factor of the test result comparison's significance and magnitude. Comparisons are considered relevant if they are significant and have at least a small magnitude .
 * **test result comparison magnitude**: how "large" the delta is between the two test result's under comparison. This is determined by the average of two factors: the absolute size of the change (i.e., a change of 5% is larger than a change of 1%) and the amount above the significance threshold (i.e., a change that is 5x the significance threshold is larger than a change 1.5x the significance threshold).
 
-## Other 
+## Job queue
+
+These terms are related to the [job queue system](./job-queue.md) that distributes benchmarking jobs across available collectors.
+
+- **benchmark request**: a request for a benchmarking a *run* on a given *artifact*. Can be either created from a try build on a PR, or it is automatically created from merged master/release *artifacts*. 
+- **collector**: a machine that performs benchmarks.
+- **benchmark set**: a subset of a compile/runtime/bootstrap benchmark suite that is executed by a collector in a single job. 
+- **job**: a high-level "work item" that defines a set of *test cases* that should be benchmarked on a specific collector.
+- **job queue**: a queue of *jobs*.
+
+## Other
 
 * **bootstrap**: the process of building the compiler from a previous version of the compiler
 * **compiler query**: a query used inside the [compiler query system](https://rustc-dev-guide.rust-lang.org/overview.html#queries).
diff --git a/docs/job-queue.md b/docs/job-queue.md
diff --git a/docs/multiple-collectors.md b/docs/multiple-collectors.md