@@ -83,13 +83,12 @@ should therefore prevent such undefined behavior.
8383:cpp:class: `nb::ndarray\< ...\> <ndarray> ` accepts template arguments to
8484specify such constraints. For example the function interface below
8585guarantees that the implementation is only invoked when it is provided with
86- a ``MxNx3 `` array of 8-bit unsigned integers that is furthermore stored
87- contiguously in CPU memory using a C-style array ordering convention.
86+ a ``MxNx3 `` array of 8-bit unsigned integers.
8887
8988.. code-block :: cpp
9089
9190 m.def("process", [](nb::ndarray<uint8_t, nb::shape<nb::any, nb::any, 3>,
92- nb::c_contig, nb:: device::cpu> data) {
91+ nb::device::cpu> data) {
9392 // Double brightness of the MxNx3 RGB image
9493 for (size_t y = 0; y < data.shape(0); ++y)
9594 for (size_t x = 0; x < data.shape(1); ++x)
@@ -100,15 +99,16 @@ contiguously in CPU memory using a C-style array ordering convention.
10099
101100 The above example also demonstrates the use of
102101:cpp:func: `nb::ndarray\< ...\> ::operator() <ndarray::operator()> `, which
103- provides direct (i.e., high-performance) read/write access to the array
104- data. Note that this function is only available when the underlying data
105- type and ndarray rank are specified. It should only be used when the
106- array storage is reachable via CPU’s virtual memory address space.
102+ provides direct read/write access to the array contents. Note that this
103+ function is only available when the underlying data type and ndarray dimension
104+ are specified via the :cpp:type: `ndarray\< ..\> <ndarray> ` template parameters.
105+ It should only be used when the array storage is accessible through the CPU's
106+ virtual memory address space.
107107
108108.. _ndarray-constraints-1 :
109109
110110Constraint types
111- ~~~~~~~~~~~~~~~~
111+ ----------------
112112
113113The following constraints are available
114114
@@ -153,6 +153,94 @@ count until they go out of scope. It is legal call
153153when the `GIL <https://wiki.python.org/moin/GlobalInterpreterLock >`__ is not
154154held.
155155
156+ .. _ndarray-views :
157+
158+ Fast array views
159+ ----------------
160+
161+ The following advice applies to performance-sensitive CPU code that reads and
162+ writes arrays using loops that invoke :cpp:func: `nb::ndarray\< ...\> ::operator()
163+ <ndarray::operator()> `. It does not apply to GPU arrays because they are
164+ usually not accessed in this way.
165+
166+ Consider the following snippet, which fills a 2D array with data:
167+
168+ .. code-block :: cpp
169+
170+ void fill(nb::ndarray<float, nb::ndim<2>, nb::c_contig, nb::device::cpu> arg) {
171+ for (size_t i = 0; i < array.shape(0); ++i)
172+ for (size_t j = 0; j < array.shape(1); ++j)
173+ arg(i, j) = /* ... */;
174+ }
175+
176+ While functional, this code is not perfect. The problem is that to compute the
177+ address of an entry, ``operator() `` accesses the DLPack array descriptor. This
178+ indirection can break certain compiler optimizations.
179+
180+ nanobind provides the method :cpp:func: `ndarray\< ...\> ::view() <ndarray::view> `
181+ to fix this. It creates a tiny data structure that provides all information
182+ needed to access the array contents, and which can be held within CPU
183+ registers. All relevant compile-time information (:cpp:class: `nb::ndim <ndim> `,
184+ :cpp:class: `nb::shape <shape> `, :cpp:class: `nb::c_contig <c_contig> `,
185+ :cpp:class: `nb::f_contig <f_contig> `) is materialized in this view, which
186+ enables constant propagation, auto-vectorization, and loop unrolling.
187+
188+ An improved version of the example using such a view is shown below:
189+
190+ .. code-block :: cpp
191+
192+ void fill(nb::ndarray<float, nb::ndim<2>, nb::c_contig, nb::device::cpu> arg) {
193+ auto v = array.view(); /// <-- new!
194+
195+ for (size_t i = 0; i < v.shape(0); ++i) // Important; use 'v' instead of 'arg' everywhere in loop
196+ for (size_t j = 0; j < v.shape(1); ++j)
197+ v(i, j) = /* ... */;
198+ }
199+
200+ Note that the view performs no reference counting. You may not store it in a way
201+ that exceeds the lifetime of the original array.
202+
203+ When using OpenMP to parallelize expensive array operations, pass the
204+ ``firstprivate(view_1, view_2, ...) `` so that each worker thread can copy the
205+ view into its register file.
206+
207+ .. code-block :: cpp
208+
209+ auto v = array.view();
210+ #pragma omp parallel for schedule(static) firstprivate(v)
211+ for (...) { /* parallel loop */ }
212+
213+ .. _ndarray-runtime-specialization :
214+
215+ Specializing views at runtime
216+ -----------------------------
217+
218+ As mentioned earlier, element access via ``operator() `` only works when both
219+ the array's scalar type and its dimension are specified within the type (i.e.,
220+ when they are known at compile time); the same is also true for array views.
221+ However, sometimes, it is useful that a function can be called with different
222+ array types.
223+
224+ You may use the :cpp:func: `ndarray\< ...\> ::view() <ndarray::view> ` method to
225+ create *specialized * views if a run-time check determines that it is safe to
226+ do so. For example, the function below accepts contiguous CPU arrays and
227+ performs a loop over a specialized 2D ``float `` view when the array is of
228+ this type.
229+
230+ .. code-block :: cpp
231+
232+ void fill(nb::ndarray<nb::c_contig, nb::device::cpu> arg) {
233+ if (arg.dtype() == nb::dtype<float>() && arg.ndim() == 2) {
234+ auto v = array.view<float, nb::ndim<2>>(); // <-- new!
235+
236+ for (size_t i = 0; i < v.shape(0); ++i) {
237+ for (size_t j = 0; j < v.shape(1); ++j) {
238+ v(i, j) = /* ... */;
239+ }
240+ }
241+ } else { /* ... */ }
242+ }
243+
156244 Constraints in type signatures
157245------------------------------
158246
@@ -364,6 +452,30 @@ interpreted as follows:
364452- :cpp:enumerator: `rv_policy::move ` is unsupported and demoted to
365453 :cpp:enumerator: `rv_policy::copy `.
366454
455+ .. _ndarray_nonstandard_arithmetic :
456+
457+ Nonstandard arithmetic types
458+ ----------------------------
459+
460+ Low or extended-precision arithmetic types (e.g., ``int128 ``, ``float16 ``,
461+ ``bfloat ``) are sometimes used but don't have standardized C++ equivalents. If
462+ you wish to exchange arrays based on such types, you must register a partial
463+ overload of ``nanobind::ndarray_traits `` to inform nanobind about it.
464+
465+ For example, the following snippet makes ``__fp16 `` (half-precision type on
466+ ``aarch64 ``) available:
467+
468+ .. code-block :: cpp
469+
470+ namespace nanobind {
471+ template <> struct ndarray_traits<__fp16> {
472+ static constexpr bool is_float = true;
473+ static constexpr bool is_bool = false;
474+ static constexpr bool is_int = false;
475+ static constexpr bool is_signed = true;
476+ };
477+ };
478+
367479 Limitations
368480-----------
369481
@@ -383,30 +495,9 @@ internal representations (*dtypes*), including
383495nanobind's :cpp:class: `nb::ndarray\< ...\> <ndarray> ` is based on the `DLPack
384496<https://github.com/dmlc/dlpack> `__ array exchange protocol, which causes it to
385497be more restrictive. Presently supported dtypes include signed/unsigned
386- integers, floating point values, and boolean values.
498+ integers, floating point values, and boolean values. Some :ref: `nonstandard
499+ arithmetic types <ndarray_nonstandard_arithmetic>` can be supported as well.
387500
388501Nanobind can receive and return read-only arrays via the buffer protocol used
389502to exchange data with NumPy. The DLPack interface currently ignores this
390503annotation.
391-
392- Supporting nonstandard arithmetic types
393- ---------------------------------------
394-
395- Low or extended-precision arithmetic types (e.g., ``int128 ``, ``float16 ``,
396- ``bfloat ``) are sometimes used but don't have standardized C++ equivalents. If
397- you wish to exchange arrays based on such types, you must register a partial
398- overload of ``nanobind::ndarray_traits `` to inform nanobind about it.
399-
400- For example, the following snippet makes ``__fp16 `` (half-precision type on
401- ``aarch64 ``) available:
402-
403- .. code-block :: cpp
404-
405- namespace nanobind {
406- template <> struct ndarray_traits<__fp16> {
407- static constexpr bool is_float = true;
408- static constexpr bool is_bool = false;
409- static constexpr bool is_int = false;
410- static constexpr bool is_signed = true;
411- };
412- };
0 commit comments