You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* Adapt test client to handle reduction over axes
* Add axis field to request data model
* Implement sum over axis
* Implement sum over multiple axes
* Implement min over multiple axes
* Rename request field: axes -> axis
* Implement max over multiple axes
* Add test to ensure selection ignores axis
* Implement count over multiple axes
* Tmp: switch to dedicated test suite branch in CI
* Misc code clean up
* Handle empty axis list in count operation
* Handle empty axis list in min, max and sum operation
* Improve validation of multi-axis reduction requests
* Refactor to avoid unwrap
* DRY refactoring
* Update docs to include axis field
* Add explanatory commment
* Revert "Tmp: switch to dedicated test suite branch in CI"
This reverts commit 95cc35f.
Copy file name to clipboardExpand all lines: docs/api.md
+9-4Lines changed: 9 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -37,6 +37,11 @@ The request body should be a JSON object of the form:
37
37
// - optional, defaults to a simple 1D array
38
38
"shape": [20, 5],
39
39
40
+
// The axis or axes over which to perform the reduction operation
41
+
// - optional, can be either a single axis or list of axes, defaults
42
+
// to a reduction over all axes
43
+
"axis": 0,
44
+
40
45
// Indicates whether the data is in C order (row major)
41
46
// or Fortran order (column major, indicated by 'F')
42
47
// - optional, defaults to 'C'
@@ -78,10 +83,10 @@ Unauthenticated access to S3 is possible by omitting the basic auth header.
78
83
On success, all operations return HTTP 200 OK with the response using the same datatype as specified in the request except for `count` which always returns the result as `int64`.
79
84
The server returns the following headers with the HTTP response:
80
85
81
-
*`x-activestorage-dtype`: The data type of the data in the response payload. One of `int32`, `int64`, `uint32`, `uint64`, `float32` or `float64`.
82
-
*`x-activestorage-byte-order`: The byte order of the data in the response payload. Either `big` or `little`.
83
-
*`x-activestorage-shape`: A JSON-encoded list of numbers describing the shape of the data in the response payload. May be an empty list for a scalar result.
84
-
*`x-activestorage-count`: The number of non-missing array elements operated on while performing the requested reduction. This header is useful, for example, to calculate the mean over multiple requests where the number of items operated on may differ between chunks.
86
+
-`x-activestorage-dtype`: The data type of the data in the response payload. One of `int32`, `int64`, `uint32`, `uint64`, `float32` or `float64`.
87
+
-`x-activestorage-byte-order`: The byte order of the data in the response payload. Either `big` or `little`.
88
+
-`x-activestorage-shape`: A JSON-encoded list of numbers describing the shape of the data in the response payload. May be an empty list for a scalar result.
89
+
-`x-activestorage-count`: The number of non-missing array elements operated on while performing the requested reduction. This header is useful, for example, to calculate the mean over multiple requests where the number of items operated on may differ between chunks.
85
90
86
91
On error, an HTTP 4XX (client) or 5XX (server) response code will be returned, with the response body being a JSON object of the following format:
Copy file name to clipboardExpand all lines: docs/architecture.md
+15-18Lines changed: 15 additions & 18 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,10 +7,10 @@ Reductionist is built on top of a number of popular open source components.
7
7
8
8
A few properties make it relatively easy to build a conceptual mental model of how Reductionist works.
9
9
10
-
* All operations share the same request processing pipeline.
11
-
* The request processing pipeline for each request is a fairly linear sequence of steps.
12
-
* There is no persistent state.
13
-
* The only external service that is interacted with is an S3-compatible object store.
10
+
- All operations share the same request processing pipeline.
11
+
- The request processing pipeline for each request is a fairly linear sequence of steps.
12
+
- There is no persistent state.
13
+
- The only external service that is interacted with is an S3-compatible object store.
14
14
15
15
The more challenging aspects of the system are the lower level details of asynchronous programming, memory management, the Rust type system and working with multi-dimensional arrays.
16
16
@@ -29,7 +29,6 @@ A diagram of this step for the sum operation is shown in Figure 2.
29
29
<figcaption>Figure 2: Sum operation flow diagram</figcaption>
30
30
</figure>
31
31
32
-
33
32
## Axum web server
34
33
35
34
[Axum](https://docs.rs/axum) is an asynchronous web framework that performs well in [various benchmarks](https://github.com/programatik29/rust-web-benchmarks/blob/master/result/hello-world.md) and is built on top of various popular components, including the [hyper](https://hyper.rs/) HTTP library.
@@ -110,13 +109,11 @@ Each operation is implemented by a struct that implements the `NumOperation` tra
110
109
For example, the sum operation is implemented by the `Sum` struct in `src/operations.rs`.
111
110
The `Sum` struct's `execute_t` method does the following:
112
111
113
-
* Zero copy conversion of the byte array to a multi-dimensional [ndarray::ArrayView](https://docs.rs/ndarray/latest/ndarray/type.ArrayView.html) object of the data type, shape and byte order specified in the request data
114
-
* If a selection was specified in the request data, create a sliced `ndarray::ArrayView` onto the original array view
115
-
* If missing data was specified in the request data:
116
-
* Create an iterator over the array view that filters out missing data, performs the sum operation and counts non-missing elements
117
-
* Otherwise:
118
-
* Use the array view's native `sum` and `len` methods to take the sum and element count
119
-
* Convert the sum to a byte array and return with the element count
112
+
- Zero copy conversion of the byte array to a multi-dimensional [ndarray::ArrayView](https://docs.rs/ndarray/latest/ndarray/type.ArrayView.html) object of the data type, shape and byte order specified in the request data
113
+
- If a selection was specified in the request data, create a sliced `ndarray::ArrayView` onto the original array view
114
+
- Checks whether the reduction should be performed over all or only a subset of the sliced data's axes
115
+
- Performs a fold over each of the requested axes to calculate the required reduction while ignoring any specified missing data
116
+
- Convert the sum to a byte array and return with the element count
120
117
121
118
The procedure for other operations varies slightly but generally follows the same pattern.
Copy file name to clipboardExpand all lines: docs/pyactivestorage.md
+22-20Lines changed: 22 additions & 20 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -3,34 +3,36 @@
3
3
Reductionist has been integrated with the [PyActiveStorage](https://github.com/valeriupredoi/PyActiveStorage) library, and PyActiveStorage acts as a client of the Reductionist server.
4
4
PyActiveStorage currently works with data in netCDF4 format, and is able to perform reductions on a variable within such a dataset.
5
5
Numerical operations are performed on individual storage chunks, with the results later aggregated.
6
-
The original POSIX/NumPy storage chunk reduction in PyActiveStorage is implementated in a `reduce_chunk` Python function in `activestorage/storage.py`, and this interface was used as the basis for the integration of Reductionist.
6
+
The original POSIX/NumPy storage chunk reduction in PyActiveStorage is implemented in a `reduce_chunk` Python function in `activestorage/storage.py`, and this interface was used as the basis for the integration of Reductionist.
7
7
The following code snippet shows the `reduce_chunk` function signature.
filters - optional list of `numcodecs.abc.Codec` filter codecs
17
-
dtype - likely float32 in most cases.
18
-
shape - will be a tuple, something like (3,3,1), this is the dimensionality of the
17
+
dtype - likely float32 in most cases.
18
+
shape - will be a tuple, something like (3,3,1), this is the dimensionality of the
19
19
chunk itself
20
20
order - typically 'C' for c-type ordering
21
21
chunk_selection - python slice tuples for each dimension, e.g.
22
22
(slice(0, 2, 1), slice(1, 3, 1), slice(0, 1, 1))
23
23
this defines the part of the chunk which is to be obtained
24
24
or operated upon.
25
-
method - computation desired
26
-
(in this Python version it's an actual method, in
25
+
method - computation desired
26
+
(in this Python version it's an actual method, in
27
27
storage implementations we'll change to controlled vocabulary)
28
-
28
+
29
29
"""
30
30
```
31
31
32
32
For Reductionist, the `reduce_chunk` function signature in `activestorage/reductionist.py` is similar, but replaces the local file path with a `requests.Session` object, the Reductionist server URL, S3-compatible object store URL, and the bucket and object containing the data.
33
33
34
+
<!-- TODO: Update to include axis arg once integrated into PyActiveStorage -->
Within the `reduce_chunk` implementation for Reductionist, the following steps are taken:
66
68
67
-
* build Reductionist API request data
68
-
* build Reductionist API URL
69
-
* perform an HTTP(S) POST request to Reductionist
70
-
* on success, return a NumPy array containing the data in the response payload, with data type, shape and count determined by response headers
71
-
* on failure, raise a `ReductionistError` with the response status code and JSON encoded error response
69
+
- build Reductionist API request data
70
+
- build Reductionist API URL
71
+
- perform an HTTP(S) POST request to Reductionist
72
+
- on success, return a NumPy array containing the data in the response payload, with data type, shape and count determined by response headers
73
+
- on failure, raise a `ReductionistError` with the response status code and JSON encoded error response
72
74
73
75
The use of a `requests.Session` object allows for TCP connection pooling, reducing connection overhead when multiple requests are made within a short timeframe.
74
76
@@ -77,9 +79,9 @@ It should be possible to provide a unified interface to storage systems by abstr
77
79
Other changes to the main `activestorage.Active` class were necessary for integration of Reductionist.
78
80
These include:
79
81
80
-
* Support for reading netCDF metadata from files stored in S3 using the [s3fs](https://s3fs.readthedocs.io/) and [h5netcdf](https://pypi.org/project/h5netcdf/) libraries
81
-
* Configuration options in `activestorage/config.py` to specify the Reductionist API URL, S3-compatible object store URL, S3 access key, secret key and bucket
82
-
* Constructor `storage_type` argument for `activestorage.Active` to specify the storage backend
83
-
* Use of a thread pool to execute storage chunk reductions in parallel
84
-
* Unit tests to cover new and modified code
85
-
* Integration test changes to allow running against a POSIX or S3 storage backend
82
+
- Support for reading netCDF metadata from files stored in S3 using the [s3fs](https://s3fs.readthedocs.io/) and [h5netcdf](https://pypi.org/project/h5netcdf/) libraries
83
+
- Configuration options in `activestorage/config.py` to specify the Reductionist API URL, S3-compatible object store URL, S3 access key, secret key and bucket
84
+
- Constructor `storage_type` argument for `activestorage.Active` to specify the storage backend
85
+
- Use of a thread pool to execute storage chunk reductions in parallel
86
+
- Unit tests to cover new and modified code
87
+
- Integration test changes to allow running against a POSIX or S3 storage backend
0 commit comments