diff --git a/collector/README.md b/collector/README.md index 0ee794eb5..7cc766d90 100644 --- a/collector/README.md +++ b/collector/README.md @@ -1,7 +1,7 @@ # Rust Compiler Performance Benchmarking and Profiling Hardware and software details of the machine that executes the CI details can be found -[here](../docs/perf-runner.md). A glossary of relevant terms can be found +[here](../docs/deployment.md). A glossary of relevant terms can be found [here](../docs/glossary.md). ## The benchmarks @@ -34,6 +34,8 @@ This crate is only compatible with OpenSSL 1.0.1, 1.0.2, and 1.1.0, or LibreSSL aborting due to this version mismatch. ``` +For benchmarking using `perf`, you will also need to set `/proc/sys/kernel/perf_event_paranoid` to `-1`. + ## Benchmarking This section is about benchmarking rustc, i.e. measuring its performance on the diff --git a/database/schema.md b/database/schema.md index 3aaf5cfaf..4414176b4 100644 --- a/database/schema.md +++ b/database/schema.md @@ -238,17 +238,9 @@ Columns: * **job_id** (`INTEGER`): A nullable job_id which, if it exists it will inform us as to which job this error is part of. -## New benchmarking design -We are currently implementing a new design for dispatching benchmarks to collector(s) and storing -them in the database. It will support new use-cases, like backfilling of new benchmarks into a parent -commit and primarily benchmarking with multiple collectors (and multiple hardware architectures) in -parallel. - -The tables below are a part of the new scheme. - ### benchmark_request -Represents a single request for performing a benchmark collection. Each request can be one of three types: +Represents a single request for performing a benchmark run. Each request can be one of three types: * Master: benchmark a merged master commit * Release: benchmark a published stable or beta compiler toolchain @@ -297,15 +289,13 @@ Columns: ### job_queue -This table stores ephemeral benchmark jobs, which specifically tell the -collector which benchmarks it should execute. The jobs will be kept in the -table for ~30 days after being completed, so that we can quickly figure out -what master parent jobs we need to backfill when handling try builds. +This table stores benchmark jobs, which specifically tell the +collector which benchmarks it should execute. Columns: -* **id** (`bigint` / `serial`): Primary*key identifier for the job row; - auto*increments with each new job. +* **id** (`bigint` / `serial`): Primary key identifier for the job row; + autoincrements with each new job. * **request_tag** (`text`): References the parent benchmark request that spawned this job. * **target** (`text NOT NULL`): Hardware/ISA the benchmarks must run on @@ -325,3 +315,5 @@ Columns: `success`, or `failure`. * **retry** (`int NOT NULL`): Number of times the job has been re*queued after a failure; 0 on the first attempt. +* **kind** (`text NOT NULL`): What benchmark suite should be executed in the job (`compiletime`, `runtime` or `rustc`). +* **is_optional** (`boolean NOT NULL`): Whether a request should wait for this job to finish before it will become completed. diff --git a/docs/README.md b/docs/README.md new file mode 100644 index 000000000..72a56553d --- /dev/null +++ b/docs/README.md @@ -0,0 +1,7 @@ +# rustc-perf documentation + +- [Glossary of useful terms](./glossary.md) +- [Database schema](../database/schema.md) +- [How rustc-perf is deployed](./deployment.md) +- [How the distributed job queue works](./job-queue.md) +- [How we compare benchmarks results](./comparison-analysis.md) diff --git a/docs/perf-runner.md b/docs/deployment.md similarity index 74% rename from docs/perf-runner.md rename to docs/deployment.md index 2659a708f..cdac984f0 100644 --- a/docs/perf-runner.md +++ b/docs/deployment.md @@ -1,7 +1,23 @@ -# Benchmarking machine -The machine that actually executes the benchmarks is the `AX-42` server running on [Hetzner](https://www.hetzner.com/dedicated-rootserver/). It has the following configuration. +# Deployment + +The machines that actually execute the benchmarks ("collectors") are dedicated machines running on [Hetzner](https://www.hetzner.com/dedicated-rootserver/). The [web server](http://perf.rust-lang.org/) runs on [ECS](https://github.com/rust-lang/infra-team/blob/HEAD/service-catalog/rustc-perf/README.md). + +## Debugging +This section documents what to do in case benchmarking doesn't work or something is stuck. The status of the collectors can be found on the [status page](https://perf.rust-lang.org/status.html). In particular, it shows the last heartbeat of each collector. If that date is very old (>1 hour), then something bad has probably happened with the collector. + +You can SSH into the machines directly and examine what is going on there. The currently active machines have the following domain names: + +- `rustc-perf-one.infra.rust-lang.org` +- `rustc-perf-two.infra.rust-lang.org` + +The benchmarking process runs as a systemd service called `collector`. You can start/stop/inspect it using the usual commands: +- Start/restart/stop: `sudo systemctl start/restart/stop collector.service` +- See logs: `sudo journalctl --utc -n 10000 -u collector -f` + +The user account under which the benchmarks execute is called `collector`, you can switch to it using `su` and examine the `/home/collector/rustc-perf` checkout, from where are the benchmarks executed. ## Hardware +- The collectors run on `AX-42` Hetzner server instances. - 8-core AMD Ryzen 7 PRO 8700GE with HyperThreading (16 hardware threads total)
Output of `lscpu` diff --git a/docs/glossary.md b/docs/glossary.md index ed6c5f665..179358937 100644 --- a/docs/glossary.md +++ b/docs/glossary.md @@ -25,6 +25,9 @@ The following is a glossary of domain specific terminology. Although benchmarks - `incr-patched`: incremental compilation is used, with a full incremental cache and some code changes made. * **backend**: the codegen backend used for compiling Rust code. - `llvm`: the default codegen backend + - `cranelift`: experimental backend designed for quicker non-optimized builds +* **target**: compilation target for which the benchmark is compiled. + - `x86_64-unknown-linux-gnu`: the default x64 Linux target * **category**: a high-level group of benchmarks. Currently, there are three categories, primary (mostly real-world crates), secondary (mostly stress tests), and stable (old real-world crates, only used for the dashboard). * **artifact type**: describes what kind of artifact does the benchmark build. Either `library` or `binary`. @@ -41,15 +44,15 @@ The following is a glossary of domain specific terminology. Although benchmarks ## Testing * **test case**: a combination of parameters that describe the measurement of a single (compile-time or runtime) benchmark - a single `test` - - For compile-time benchmarks, it is a combination of a benchmark, a profile, and a scenario. - - For runtime benchmarks, it is currently only the benchmark name. + - For compile-time benchmarks, it is a combination of a benchmark, a profile, a scenario, a codegen backend and a target. + - For runtime benchmarks, it a combination of a benchmark and a target. * **test**: the act of running an artifact under a test case. Each test is composed of many iterations. * **test iteration**: a single iteration that makes up a test. Note: we currently normally run 3 test iterations for each test. -* **test result**: the result of the collection of all statistics from running a test. Currently, the minimum value of a statistic from all the test iterations is used for analysis calculations and the website. -* **statistic**: a single measured value of a metric in a test result +* **test result**: the set of all gathered statistics from running a test. Currently, the minimum value of a statistic from all the test iterations is used for analysis calculations and the website. +* **statistic**: a single measured value of a metric in a test iteration * **statistic description**: the combination of a metric and a test case which describes a statistic. * **statistic series**: statistics for the same statistic description over time. -* **run**: a set of tests for all currently available test cases measured on a given artifact. +* **run**: a set of tests for all currently available test cases measured on a given artifact. ## Analysis @@ -60,7 +63,17 @@ The following is a glossary of domain specific terminology. Although benchmarks * **relevant test result comparison**: a test result comparison can be significant but still not be relevant (i.e., worth paying attention to). Relevance is a factor of the test result comparison's significance and magnitude. Comparisons are considered relevant if they are significant and have at least a small magnitude . * **test result comparison magnitude**: how "large" the delta is between the two test result's under comparison. This is determined by the average of two factors: the absolute size of the change (i.e., a change of 5% is larger than a change of 1%) and the amount above the significance threshold (i.e., a change that is 5x the significance threshold is larger than a change 1.5x the significance threshold). -## Other +## Job queue + +These terms are related to the [job queue system](./job-queue.md) that distributes benchmarking jobs across available collectors. + +- **benchmark request**: a request for a benchmarking a *run* on a given *artifact*. Can be either created from a try build on a PR, or it is automatically created from merged master/release *artifacts*. +- **collector**: a machine that performs benchmarks. +- **benchmark set**: a subset of a compile/runtime/bootstrap benchmark suite that is executed by a collector in a single job. +- **job**: a high-level "work item" that defines a set of *test cases* that should be benchmarked on a specific collector. +- **job queue**: a queue of *jobs*. + +## Other * **bootstrap**: the process of building the compiler from a previous version of the compiler * **compiler query**: a query used inside the [compiler query system](https://rustc-dev-guide.rust-lang.org/overview.html#queries). diff --git a/docs/job-queue.md b/docs/job-queue.md new file mode 100644 index 000000000..4387a4413 --- /dev/null +++ b/docs/job-queue.md @@ -0,0 +1,154 @@ +# Job queue + +> Before reading this document, please examine the [glossary](./glossary.md), in particular the part about the [job queue](./glossary.md#job-queue). + +In addition to simple local execution of benchmarks, `rustc-perf` can also serve as a distributed system that supports benchmarking a single compiler artifact in parallel across several collector (machines) that can even run on various hardware architectures. This distributed system is documented in this file. + +Another overview of how the rustc-perf benchmark suite operates can found [here](https://kobzol.github.io/rust/rustc/2023/08/18/rustc-benchmark-suite.html), although that was written before `rustc-perf` supported multiple machines. + +## High-level overview +There are two main entities operating in the distributed system: +- The [website](https://perf.rust-lang.org) receives requests for benchmarking a specific version of the compiler, splits those requests into more granular chunks of work ("jobs"), waits until all (non-optional) jobs of a request are completed and then reports the benchmark results on the corresponding pull request via a comment on GitHub. +- A set of collectors (dedicated machines) repeatedly poll for new jobs appearing. After they dequeue a job, they benchmark all test cases from it and store the results into the database. + +The website communicates with the collectors through a Postgres database, there is no other external service for managing the job queue. + +Let's walk through the distributed system step by step. + +## Website +The main goal of the website in the distributed system is to create benchmark requests, split them into jobs, and wait until they are completed. The website has a periodically executed background process ("cron") that checks whether some progress can be made. + +### Creating benchmark requests +The main event that starts the whole benchmarking process is the creation of a single benchmark request. Benchmark requests are stored in the database in the `benchmark_request` column. + +They come in three types: + +- Master: benchmark a commit merged into the `rust-lang/rust`'s default branch. + - Master requests are created in the cron, once new commits appear in `rust-lang/rust`. +- Try: benchmark a try build on a given PR. + - Try requests are created by commands sent in GitHub comments in a PR. +- Release: benchmark a released stable/beta version of Rust. + - Release requests are created in the cron, once a new release is made. + +Every benchmark request has a *parent* request, with which its benchmark results will be compared once it is finished. A parent request should generally always be benchmarked before its children. + +The benchmark request can be in one of four states, whose state diagram is shown below: + +```mermaid +flowchart LR + WA[Waiting for artifacts] -->|Try build finished| AR + AR[Artifacts ready] -->|Jobs enqueued| IP + IP[In progress] --> |Success| Completed + IP --> |Error| Failed +``` + +Some useful observations: +- Try requests start in the `Waiting for artifacts` state, while master and release requests already start in the `Artifacts ready` state. +- A request cannot start being benchmarked until its compiler artifacts are available on CI. +- Once a request moves into the `Completed` or `Failed` state, its state will never change again. + - New jobs can still be generated for such a request though, see [backfilling](#backfilling) below. + +### Benchmark request queue +Since multiple requests can be ready to be benchmarked at any given time, the website orders them in a queue, which is displayed on the [status page]. + +The ordering of the queue is somewhat ephemeral, as it can change anytime a new benchmark request appears. That being said, the website also tries to avoid "jumps" in the queue when requests are completed, for better predictability of when a given request benchmark will be completed. + +The ordering looks approximately like this: +1. `In progress` requests. +2. Release requests, sorted by date and then name. +3. `Artifacts ready` requests. These requests are sorted topologically, with the topological level determined by the transitive number of parents that are not done yet. So requests with a done parent have priority between requests whose parent isn't done yet. + +Within the individual groups, requests are ordered by their PR number and creation time. + +Currently, the website maintains an invariant that at most a single benchmark request is `In progress`. This means that even if one of the collectors is finished with all its jobs, it will have to wait until all other jobs of the request are complete. This helps us synchronise the benchmarking workload so that it is easier to keep track of what is going on in the system. This constraint could be relaxed in the future, most of the system should be prepared for running multiple requests at the same time. Although if the workflow is well-balanced, it should not be needed. + +### Enqueuing jobs +The cron periodically scans the benchmark request queue. Once it sees that no benchmark request is `In progress`, and there is at least a single request that is in the `Artifacts ready` state, it will atomically transition that request to the `In progress` state and enqueue a set of benchmark jobs for the request into the `job_queue` database table. + +Each benchmark job describes a subset of test cases of the whole request that will be benchmarked on a single collector. More specifically, it states which test cases (profile, codegen backend and target), benchmark suite (compile, runtime or rustc) and which *benchmark set* (a subset of the compile benchmark suite) should be benchmarked. + +The jobs exist so that we can split the request into smaller chunks, which can then be benchmarked in parallel on multiple collectors, thus shortening the whole benchmark run duration. This granularity also allows us to have collectors with different hardware architectures, and support [backfilling](#backfilling). + +Each job can exist in the following four states: +```mermaid +flowchart LR + Queued -->|Job dequeued for the first time| IP + IP[In progress] --> |Success| Success + IP[In progress] --> |"Failure (retry<MAX_RETRIES)"| IP + IP[In progress] --> |"Failure (retry==MAX_RETRIES)"| Failure +``` + +Once the jobs have been enqueued, the website will repeatedly check in the cron whether all (non-optional[^optional]) of the `In progress` request and also all jobs of its parent (see [backfilling](#backfilling)) have been completed. Once that happens, it will then transition the request into the `Completed` or `Failed` state (based on whether there were any failed jobs or not) and send a GitHub pull request comment with the benchmark result (for master and try requests). + +[^optional]: Some jobs can be marked as optional; this is designed to allow running experimental collectors that should not "block" the main benchmark request workflow. + +#### Benchmark set +Each job contains a specific *benchmark set*, a small integer that identifies which subset of the compile benchmark suite should be benchmarked in the job. This is used to further split the compile benchmark suite, which takes the most time to run, and thus enable parallelizing its execution across multiple collectors. + +The compile benchmark suite for a given target is split into `N` benchmark sets. To run the suite, `N` collectors (with benchmark sets `0, 1, ..., N`) have to be available. + +Each collector has a hard-coded benchmark set that it always benchmarks. Benchmarks should ideally not move within the sets, to ensure that each benchmark will always be benchmarked on the same machine, to avoid unnecessary environment noise. + +That being said, sometimes it might be useful to balance the sets a little bit, to ensure that all collectors can run their jobs in approximately the same duration, to avoid unbalanced workloads. + +The fact that the benchmark sets are assigned to collectors *statically* and there is no load balancing or work-stealing going on in-between the sets means that if one of the collectors stops running, **it will halt the whole system**. In that case, a [manual intervention](./deployment.md) might be required. + +#### Backfilling +When the website enqueues jobs of a request, it also enqueues the jobs with the same parameters for its parent request. Parents should always be benchmarked before their children, so in most cases, all the parent jobs will already be present in the job queue. + +However, someone can create a try request with *non-default* parameters. For example, they could request benchmarking `cranelift` codegen backend, which is not normally benchmarked on master requests. When that happens, we need to ensure that we will also run those non-default jobs for the *parent* in addition to the try request itself, otherwise we wouldn't have anything to compare to. + +This situation is called `backfilling`. When the website enqueues jobs for a request with non-default parameters, it will create *new jobs* also for its parent request, which did not exist before. The collectors will then go and also benchmark those parent jobs, thus "backfilling" the results into a request that was already completed previously. The status of the parent request does not change when this happens, it stays `Completed` or `Failed`. + +## Collectors + +The main job of each collector is to continuously dequeue jobs from the job queue, run all of their benchmarks and store the results into the database. + +### Registration +Individual collectors have to be registered in the database, so that we can show their status on the [status page], even if they were offline at the moment. + +Their information is stored in the `collector_config` table. Each collector has its assigned benchmark set and target, which are used to determine which jobs the collector will handle. + +To register a new collector, you can run the following command: + +```bash +cargo run --bin collector add_collector \ + --collector_name "" \ + --target \ + --benchmark_set "" \ + --is_active +``` + +Collector names and `(target, benchmark_set)` combinations have to be unique. + +If a given collector is not used anymore, it can be marked in the database as being inactive. This currently has to be done manually by modifying the database. + +### Dequeuing jobs +To run a collector with a given name, you can run the following command: +```bash +cargo run --bin collector benchmark_job_queue --collector_name "" +``` + +After starting, the collector will enter a loop in which it will repeatedly (every 30s) poll the `job_queue` table, looking for a job that matches its target and benchmark set. If it finds such a job, it will atomically dequeue it, marking it as being `In progress`, and increasing its [retry counter](#failure-handling-and-retries). Then it will download the compiler artifacts[^artifacts-cache] specified by the job and perform all its test cases. + +[^artifacts-cache]: The artifacts are cached on disk, to avoid re-downloading the same artifacts multiple times, as every benchmark request will generate several jobs for each active collector. + +### Failure handling and retries +Several kinds of failures can happen during the execution of a job: +- Handled transient failure: it was not possible to download CI artifacts because of a transient network error, or it was not possible to communicate with the DB. In this case, the collector will try to record the error into the `errors` table. The job will not be marked as completed, it will simply be dequeued later again, after a short wait. +- Handled permanent failure: some error that is most likely unrecoverable has happened (for example, CI artifacts for the given compiler `SHA` are not available). In this case, the collector will record the error and immediately mark the job as failed and moves on. +- Unhandled failure (panic): the collector failed unexpectedly and could not record the error (we currently don't catch panics). In this case the `collector` service will restart the collector later, and it will try to dequeue the job again. + +If the collector dequeues a job that already has its retry counter set to `MAX_RETRIES` (see [job lifecycle](#enqueuing-jobs) diagram), it will mark the job as failed. + +The collector prioritizes continuing `In progress` jobs before starting new `Queued` jobs. + +### Automatic git update +The collector is executed through a [bash script](../collector/collect-job-queue.sh), which runs it in a loop (in case it ends or crashes). Before the collector is started, the bash script downloads the latest version of `rustc-perf` from GitHub, and rebuilds the collector, to keep it up to date. + +If there are always enough jobs to benchmark, the collector might not exit for some time. The collector thus also checks the latest `rustc-perf` commit SHA while it is running. If it determines that a new version is available, it shuts itself down to let the bash script update it. However, the collector tries to delay the shutdown until after it finishes all jobs of a request that is currently `In progress`, to avoid changing the collector version in a single benchmark run. + +### Heartbeat +The collector periodicaly updates its last heartbeat date, which is displayed on the [status page]. When the heartbeat is too old, the collector will be marked as being `Offline`. + +[status page]: https://perf.rust-lang.org/status.html diff --git a/docs/multiple-collectors.md b/docs/multiple-collectors.md deleted file mode 100644 index 94132fa7b..000000000 --- a/docs/multiple-collectors.md +++ /dev/null @@ -1,148 +0,0 @@ -# rustc-perf - Multiple Collectors Documentation - -rustc-perf has been enhanced to support parallel benchmarking and execution across various architectures. While this enables a distributed architecture in deployment environments, local benchmarking continues to operate as before. - -The previous documentation for the rustc-benchmark-suite can be found [here](https://kobzol.github.io/rust/rustc/2023/08/18/rustc-benchmark-suite.html). The major difference is the section "Performance measurement workflow" which is documented here. - -The table below details a set of keywords, or a glossary of terms, that appear throughout this doc and the codebase. The naming aims to minimally identify the constituent parts of the system. - -## Keywords -| Term | Meaning | -|------|---------| -| **artifact** | A single Rust compiler toolchain built from a specific commit SHA. | -| **metric** | A quantifiable metric gathered during the execution of the compiler (e.g. instruction count). | -| **benchmark** | A Rust crate that will be used for benchmarking the performance of `rustc` (a compile-time benchmark) or its codegen quality (a runtime benchmark) | -| **profile** | Describes how to run the compiler (e.g. `cargo build/check`). A profile is a **benchmark parameter**. | -| **scenario** | Further specifies how to invoke the compiler (e.g. incremental rebuild/full build). A scenario is a **benchmark parameter**. | -| **backend** | Codegen backend used when invoking `rustc`. A backend is a **benchmark parameter**. | -| **target** | Roughly the Rust target triple, e.g. `aarch64-unknown-linux-gnu`. A target is a **benchmark parameter**. | -| **benchmark suite** | A set of *benchmarks*. We have two suites - compile-time and runtime. | -| **test case** | A combination of a *benchmark* and its *benchmark parameters* that uniquely identifies a single *test*. For compile-time benchmarks, it's *benchmark* + *profile* + *scenario* + *backend* + *target*, for runtime benchmarks it's just *benchmark*. Unique instance of compile-time/run-time benchmark parameters. | -| **test** | Identifies the act of benchmarking an *artifact* under a specific *test case*. Each test consists of several *test iterations*. | -| **test iteration** | A single actual execution of a *test*. | -| **collection** | A set of all *statistics* for a single *test iteration*. | -| **test result** | The result of gathering all *statistics* from a single *test*. Aggregates results from all *test iterations* of that *test*, so a *test result* is essentially the union of *collections*. Usually we just take the minimum of each statistic out of all its *collections*. | -| **statistic** | A single measured value of a *metric* in a *test result*. | -| **run** | A set of all *test results* for a set of *test cases* measured on a single *artifact*. | -| **benchmark request** | A request for a benchmarking a *run* for a given *artifact*. Can be either created from a try build on a PR, or it is automatically determined from merged master/release *artifacts*. | -| **benchmark set** | A selection of benchmarks that a collector will run. A collector is assigned a benchmark_set id | -| **collector** | A physical runner for benchmarking the compiler. | -| **cluster** | One or more collectors of the same target, for benchmarking the compiler. | -| **collector_id** | A unique identifier of a *collector* (hard-coded at first for simplicity). | -| **job** | High-level "work item" that defines a set of *test cases* that should be benchmarked on a specific collector. | -| **job_queue** | Queue of *jobs*. | -| **MAX_JOB_FAILS** | Maximum number of failures before a job is marked as a failed. | -| **Assigning a job** | The act of allocating one or more *jobs* to a collector. | -| **website** | A standalone server responsible for inserting work into the queue. | -| **backfilling** | Occurs when a commit's parent_sha does not have the same configuration as the request currently being enqueued. In this case, jobs with the requested configuration are added so that the commit can be benchmarked against its parent under matching conditions. | -| **benchmark index** | A set off shas and release tags which have completed benchmark requests. Saves database lookups. | - -## Programs that need to be available - -`perf` with `/proc/sys/kernel/perf_event_paranoid` set to -1 else the collector will panic. Setting this to 4 is a convenient way for testing error cases however. - -## Database schema - -For a complete overview of the database structure, refer to the [schema documentation](https://github.com/rust-lang/rustc-perf/blob/master/database/schema.md). Only the most relevant tables are discussed below to prevent this document from becoming overly verbose. - -## How The Flow Works - -There are two major components in the new system; the website (CRON) and the collectors. - -### CRON Lifecycle - -It's simplest to show how the new system works by walking through it step by step. We will start with the website, which accepts requests as a web server and also has a cron job for managing the queue. This is the entry point for how work is queued. - -Step 1 - Creating requests: - -The CRON will draw down all master commits and check the SHA's against the benchmark index, if the SHA does not exist in the index then it will be added to the database. The same process also happens for Releases with the same logic to determine if a request needs to be stored in the database. - -Try commits are added on an adhoc basis by rustc developers manually making an http request to benchmark a commit. There will be a period of time where the artifact, for a Try, is not ready for benchmarking and will be in the state `waiting_for_artifacts`. Once the artifact is ready the request will move to `artifacts_ready`, indicating that the request is ready for benchmarking. This is updated through a web hook on the webserver. - -Step 2 - Creating jobs: - -The CRON will create a queue and if the first request in the queue is not `in_progress`, will dequeue the request and split the request into `benchmark_job`'s (jobs). If the request has a parent tag, a request will be make and jobs will also be enqueued for the parent. If the jobs for the parent already exist then the database will simply ignore them. This process of finding jobs which need to be populated for the parent is "backfilling". - -The states go as follows; - -`waiting_for_artifacts` -> `artifacts_ready` -> `in_progress` -> `completed` - -Only one request can presently be `in_progress` at any one time. If a request is in progress the CRON does not start splitting up other requests into jobs. - -Step 3 - Completing requests: - -If the request at the head of the queue is `in_progress` the CRON will check to see if all the jobs associated with the request are in the state `failure` or `success` if they are the request will be marked as `completed`. - -From here if a request is marked as `completed` then the next request that is in the state `artifacts_ready` will be expanded into the jobs needed to fulfil the request. This will be all the combinations of target, profile, - -### Collector Lifecycle - -The collectors are registered through configuration in the `collector_config` table. The configuration includes the architecture of the collector and a `benchmark_set` id. The `benchmark_set` is used to lookup what benchmarks the collector should run. If there is only one collector then the set would contain all items. Presently this is hardcoded in the github rustc-perf repository and altered through pull requests. - -The collectors run in a loop polling the Postgres database for work and exiting if there is no work for it to do. - -Step 1: - -Determine if the code the collector is running is out of date, if it is the collector will exit. The collector is run through a bash script which will pull down the latest code from github. - -Step 2: - -Collector pulls down it's configuration from the database. If there is no configuration matching what the collector should have, the collector will panic and exit the loop. Otherwise the collector will try and dequeue a job, if there is no job it will exit gracefully or go to Step 3. - -Step 3: - -Once the collector has dequeued the job, the collector will proceed to lookup what benchmarks need to be done by looking them up using the benchmark set id. The collector will then loop over the items in the set executing the benchmarks and recording the results in the `pstat` and `pstat_runtime` tables. - -Step 4: - -The collectors health is monitored by updating a heartbeat column in the `collector_config` table. The UI will indicate the collector as offline if it is inactive for a specified period of time. This should be caught either by error logs or someone viewing the page and subsequently reporting the collector as offline in Zulip. - -## Queue ordering -The ordering of the queue is by priority, we assume that there is a collector online that is currently looking for work. -- In progress requests, if there is a request that's state is `in_progress` the collector will take this request, for this to happen it presumably errored at some point and is restarting. -- Release requests, sorted by date and then name -- Requests whose parents are ready. - - Do a topological sort (topological index = transitive number of parents that are not finished yet) - - Order by topological index, type (master before try), then PR number, then `created_at` -- Requests that are waiting for artifacts - - Order by PR number, then `created_at` - -## `benchmark_request` table - -This table stores permanent benchmark requests for try builds on PRs and for master and published artifacts. If any benchmarking happens (through the website), there has to be a record of it in benchmark_request. - -- `waiting_for_artifacts`: a try build is waiting until CI produces the artifacts needed for benchmarking. At this point in time it is possible for a `request` to have not corresponding commit sha (stored in the tag column) - - master artifact waits for all its (grand)parent benchmark requests to be completed - - try artifact waits for all its (grand)parent benchmark requests to be completed, plus optionally for all its direct parent jobs to be completed (due to backfilling) -- `artifacts_ready`: artifact is ready for benchmarking -- `in_progress`: jobs for this request are currently in job_queue, waiting to be benchmarked -- `completed`: all jobs have been completed; either through `success` or a `failure`, and a GH PR comment was sent for try/master builds - -## `job_queue` table - -This table stores benchmark jobs, which specifically tell the collector which benchmarks it should execute. The jobs after being completed, so that we can quickly figure out what master parent jobs we need to backfill when handling try builds. - -If you request backfill of data after and the jobs do not exist in the database, new jobs will be created, but that shouldn't matter, because the collector will pick them up, do essentially a no-op (because the test results will be already in the DB), and then mark the job as finished. - -The table keeps the following invariant: each job stored into it has all its corresponding parent test cases benchmarked and stored in the DB. - -## Limitations -A lot of what has described required manual intervention or codechanges in the repository. For example registering a new collector or configuring a current one is all done through code changes in the repository or manually updating the database. - -Aside from the obvious shortcomings, due to resources, there are some edge-cases that are worth documenting. - -### One request at a time - -Even if one of the collectors is finished with all of the jobs allocated to it for a request it will effectively spin until the request is fully complete. This helps us to synchronise the workload making it easier to keep track of what is going on. - -### Deactivating a collector which has a job in progress - -Marking a collector's status from `is_active = true` to `false` in the database does not immediately take the collector offline. Instead it will finish the job that is currently assigned to it and then on the next iteration exit. - -### Max retries - -The system will try to run a request three times before bailing and moving on to the next request. This does not take into account nuances like the database being unreachable for example. - -### Static dependencies - -The division of the benchmark sets are statically divided by the collectors, if there are multiple collectors and one went offline then a request would hang. This would require manual intervention to resolve.