Skip to content

Commit 7166956

Browse files
authored
Merge pull request #3149 from IntersectMBO/add_manager_desc
docs: Update cluster management docstring with detailed explanation
2 parents 52d16f2 + bb5bcd9 commit 7166956

File tree

5 files changed

+83
-4
lines changed

5 files changed

+83
-4
lines changed

cardano_node_tests/cluster_management/cluster_getter.py

Lines changed: 17 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,20 @@
1-
"""Functionality for obtaining and setting up a cluster instance."""
1+
"""Functionality for obtaining and setting up a cluster instance for parallel test execution.
2+
3+
The `ClusterGetter` class is responsible for managing a pool of cluster instances and assigning them
4+
to tests running in parallel on different pytest workers. It ensures that tests get a suitable,
5+
properly configured, and healthy cluster instance to run on.
6+
7+
Coordination between workers is achieved through a system of status files created in a shared
8+
temporary directory. These files signal the state of each cluster instance (e.g., running,
9+
needs respin), which tests are running on which instance, and what resources are locked or in use.
10+
11+
The core logic is implemented in the `get_cluster_instance` method. It enters a loop where it
12+
evaluates the state of all available cluster instances against the requirements of the current test
13+
(e.g., resource needs, custom scripts, priority). It will wait and retry until a suitable instance
14+
is found and all conditions for starting the test are met. This includes handling cluster restarts
15+
(respins), resource allocation, and synchronization for tests that share expensive setups
16+
(marked tests).
17+
"""
218

319
import dataclasses
420
import logging

cardano_node_tests/cluster_management/cluster_management.py

Lines changed: 28 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,31 @@
1-
"""Module for exposing useful components of cluster management."""
1+
"""Module for exposing useful components of cluster management.
2+
3+
The cluster management system is designed to manage a pool of Cardano cluster instances for running
4+
tests in parallel using `pytest-xdist`. It coordinates access to these shared cluster instances
5+
by multiple test workers.
6+
7+
Key concepts:
8+
- **Pool of Instances**: Multiple cluster instances can be running concurrently. Each test worker
9+
requests a cluster instance to run a test on.
10+
- **Coordination via File System**: Workers communicate and coordinate through a system of status
11+
files created on a shared file system. These files act as locks and signals to indicate the
12+
state of cluster instances (e.g., which test is running, if a respin is needed, which
13+
resources are locked). The `status_files` module manages the creation and lookup of these
14+
files.
15+
- **Resource Management**: Tests can declare what resources they need. A resource can be, for
16+
example, a specific feature of a cluster that cannot be used by multiple tests at the same
17+
time. The `ClusterManager` handles locking of these resources so that only one test can use
18+
them at a time.
19+
- **Cluster Respin**: Some tests can modify the state of a cluster in a way that it cannot be
20+
used by subsequent tests. These tests can request a "respin" of the cluster instance, which
21+
re-initializes it to a clean state.
22+
- **`ClusterManager`**: This is the main class that test fixtures interact with. Its `get()`
23+
method is used to acquire a suitable cluster instance for a test, taking into account available
24+
instances, resource requirements, and scheduling priority.
25+
26+
This system allows for efficient parallel execution of tests that require a running Cardano
27+
cluster, by reusing cluster instances and managing contention for shared resources.
28+
"""
229

330
# flake8: noqa
431
from cardano_node_tests.cluster_management.common import CLUSTER_LOCK

cardano_node_tests/cluster_management/manager.py

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,13 @@
1-
"""Functionality for managing cluster instances."""
1+
"""High-level management of cluster instances.
2+
3+
This module provides the `ClusterManager` class, which is the main interface for tests to get a
4+
fully initialized cluster instance. The `ClusterManager` is responsible for selecting an available
5+
cluster instance that meets the test's resource requirements, preparing the `clusterlib` object,
6+
and performing cleanup actions after the test has finished.
7+
8+
The `ClusterManager` is instantiated by the `cluster_manager` fixture for each test worker and is
9+
used by the `cluster` fixture to get a cluster instance for a test.
10+
"""
211

312
import contextlib
413
import dataclasses

cardano_node_tests/cluster_management/resources_management.py

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,17 @@
1-
"""Functionality for getting a cluster instance that has required resources available."""
1+
"""Functionality for selecting a cluster instance that has required resources available.
2+
3+
Resources can be requested by name (string).
4+
5+
It is also possible to use filters. A filter is a class that gets a list of all unavailable
6+
resources and returns a list of resources that should be used. An example is `OneOf`, which returns
7+
one usable resource from a given list of resources. The unavailable resources passed to the filter
8+
include resources that are unavailable because they are locked, and also resources that were
9+
already selected by preceding filters in the same request.
10+
11+
It is possible to use multiple `OneOf` filters in a single request. For example, using `OneOf`
12+
filter with the same set of resources twice will result in selecting two different resources from
13+
that set.
14+
"""
215

316
import random
417
import typing as tp

cardano_node_tests/cluster_management/status_files.py

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,17 @@
1+
"""Cluster instance status files.
2+
3+
Status files are used for communication and synchronization between pytest workers.
4+
5+
All status files are created in the single temp directory shared by all workers
6+
(the directory returned by `temptools.get_pytest_root_tmp()`). This allows all
7+
workers to see status files created by other workers.
8+
9+
Common components of status file names:
10+
* `_@@<resource_name>@@_`: resource name
11+
* `_%%<mark>%%_`: test mark
12+
* `_<worker_id>`: pytest worker ID
13+
"""
14+
115
import pathlib as pl
216
import re
317
import typing as tp

0 commit comments

Comments
 (0)