Skip to content

Conversation

Emvlt
Copy link
Contributor

@Emvlt Emvlt commented May 26, 2025

The aim of this pull request is to make ODL multi backend through the Python Array API.

@Emvlt Emvlt force-pushed the python_array_api_support branch from ab70b6f to c698bfa Compare May 26, 2025 15:09
@leftaroundabout leftaroundabout marked this pull request as draft May 27, 2025 13:21
@leftaroundabout leftaroundabout self-assigned this May 27, 2025
@leftaroundabout
Copy link
Contributor

Would it be a lot of work to put the weighting hierarchy in its own PR? I think there are a bunch of questions to be discussed there, and it should be checked thoroughly whether this really works, independent of the Python Array API generalizations.

What certainly needs clarification (if not changes) is that about the methods being "mutually exclusive". It makes some sense from the definition side (we don't want conflicting semantics between e.g. distance and norm). But surely, from the use side we should then have all of them available, as far as mathematically possible? In the most typical use case (say, $L^2$) they can all be derived from the inner product, so it should be sufficient to provide only that, or alternatively the weights (which can then be used to implement first the inner product via array-API and then everything else from that).
But then there are also cases (Banach spaces) where there really is no inner product, and we'd need either weights and and exponent, or a custom norm, which also gives rise to a distance. In principle we could have only a distance and nothing else – a metric space – but I'm not sure if that would ever crop up, given that the TensorSpace class already requires a vector-space structure.

@Emvlt
Copy link
Contributor Author

Emvlt commented May 28, 2025

About the fact that the weighting refactoring should be in its own PR. I chose to add it here as for me, the python array API allows decoupling the backend classes (which will only add a few attributes to the Weighting) and the abstract class. As i am doing it for the TensorSpace and TensorSpaceElement classes i thought it would fit :)
As for the behaviour, I am not adding anything: the classes are refactored but the functionality remain unchanged. I'm happy to open the discussion on the Banach spaces and agree to create a separate PR for that :)

@leftaroundabout leftaroundabout marked this pull request as ready for review May 28, 2025 13:16
@leftaroundabout
Copy link
Contributor

I did not mean to click "ready for review"

@leftaroundabout leftaroundabout marked this pull request as draft May 28, 2025 13:18
@leftaroundabout
Copy link
Contributor

the classes are refactored but the functionality remain unchanged

Well, for one thing, I don't think the current state actually works correctly for a simple inner-product space. In the inner branch of the initialization, you're only overriding the __inner attribute, but __norm stays at its default value. Thus calling norm on that weighting will go through the default unit weight and array-norm, which will in general be inconsistent with the custom inner product.

@leftaroundabout
Copy link
Contributor

Pretty sure all that could be fixed, but the thing is, it's not so clear-cut to say the functionality remains unchanged (and whether that even makes sense) given that the decision logic is set up in a different way now, with a dict-of-methods instead of inheritance. And this would be best investigated in a separate PR.

@Emvlt
Copy link
Contributor Author

Emvlt commented May 28, 2025

Okay, got you :) I should have checked the behaviour better.

However i don't really understand the point about splitting the PRs. I looked at your pytorch backend PR: the commits are mixed and i think it's still readable. I am also not sure that you ran the tests that would have spotted the errors that we all do while coding. I told you this is a WIP, you told me to make a PR and now you say that it's not thoroughly checked.

I guess we should align in terms of how we push changes to this repo. How about discussing live on Friday?

@leftaroundabout
Copy link
Contributor

No critique meant. We're moving in the right direction, I just keep underestimating the amount of individual changes and possible implications. The more we separate concerns, the better we can ensure that each change is safe and won't cause more problems further down the road than it fixes.

if is_real_dtype(x2.dtype):
return np.vecdot(x1.ravel(), x2.ravel())
else:
# This could also be done with `np.vdot`, which has complex conjugation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is now out of date. Fixed in d9575e8

assert all_equal(x, ['1', '2', 'inf'])
with pytest.raises(AssertionError):
x = vector(inp)
# assert isinstance(x, NumpyTensor)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it makes sense for us to support string dtypes at all. Leaving those commented assertions makes it unclear what's the intention.

TENSOR_SPACE_IMPLS = {
'numpy': NumpyTensorSpace
}
AVAILABLE_DEVICES = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't really be needed now that available_devices is in ArrayBackend, right?

I think it would be better if everything that needs to know what devices are available for an impl goes through the corresponding ArrayBackend object instead of fiddling with another global dictionary (which would have to be kept in sync when new backends are added).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that the devices must be only accessed through the backend. I used this variable to build the IMPL_DEVICE_PAIRS fixture as we loaded the backends but that's not clear, actually. I will move that in the pytest_config instead :)

is_complex_floating_dtype, is_numeric_dtype, is_real_dtype,
is_real_floating_dtype, is_string, normalized_axes_tuple,
is_complex_dtype, is_numeric_dtype, is_real_dtype,
is_floating_dtype, is_string, normalized_axes_tuple,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I like this change of naming. After all, complex numbers (at least the standard complex64 and complex128 are also based on floating point, so it's somewhat misleading to exclude them in something called is_floating_dtype.

Why not just leave those names as they are?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two things to make you reconsider :-)

  • as per NumPy's documentation, "There are 5 basic numerical types representing booleans (bool), integers (int), unsigned integers (uint) floating point (float) and complex." https://numpy.org/doc/stable/user/basics.types.html in Numerical data types.
  • Removing the check for complex inside the is_floating_type did not affect the tests.

Copy link
Contributor

@leftaroundabout leftaroundabout Jul 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as per NumPy's documentation, "There are 5 basic numerical types..."

but those are concrete fixed types (up to bit width). The is_XYZ_dtype don't refer to one particular type-flavour, but each discerns a whole bunch of types. And in the same way both int and float are numeric types (meaning for practical purposes that they support + and * and -), both float and complex are floating types, in the sense that they support / and sqrt etc.. So it might make sense to have also an is_floating_dtype which would include both float and complex, as there are many things that should work for both of those. But at any rate we should still have is_real_floating_dtype that specifically only has float32 and float64.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay with that!

Emvlt added a commit to Emvlt/odl that referenced this pull request Jul 7, 2025
Making sure that we are consistent with the use of the ArrayBackend class
@leftaroundabout
Copy link
Contributor

  1. Deprecating the test that relied on Numpy mechanisms to perform arithmetic operations between product spaces and lists. In my opinion, this should not be supported. It seems that nowhere in the codebase this behaviour is relied on. What does @leftaroundabout thinks about it?

I suppose you mean test_unary_ops and test_operators? (BTW commenting out a test case isn't really "deprecating" it)

Well, we certainly want -x and x + y and so on to work when x and y are elements of a pspace. This has nothing to do with NumPy, and it should be tested for both NumPy and PyTorch.

@Emvlt
Copy link
Contributor Author

Emvlt commented Jul 10, 2025

Sure, deprecating is not the right word, you know what i mean though.

perform arithmetic operations between product spaces and lists so not when x and y are elements of a pspace.

@leftaroundabout
Copy link
Contributor

perform arithmetic operations between product spaces and lists so not when x and y are elements of a pspace.

Ah, like x + [4,5,6] for x∈ℝ³×ℝ³×ℝ³? Where exactly is/was that tested?

I'm not sure if it should be supported. I tend to say no, it would be better to require it to be written x + my_pspace.element([4,5,6]).

from .utils import get_array_and_backend
from numbers import Number
import numpy as np

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need this module? What is the intended use case? What are the semantics of the functions, and are they sensible?

I feel like there is kind of a wheel that keeps being reinvented in ODL, first with the various NumPy-based tricks, then with ArrayOnBackendManager, then access to the Array API, finally the ArrayBackend class.

As soon as you have an array_namespace, you can just look up the functions from that. That's certainly not something the user should need to do for basic elementwise stuff etc., but on the other hand functions such as empty_like are pretty low-level (in the ODL perspective). When on that level, how much do we really win by using functions from yet another module, with yet another semantics to get used to, as opposed to looking into the appropriate array_namespace directly at the use site?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's consider x. We don't know if it's an np.ndarray, a torch.Tensor or a LinearSpaceElement.
We need to have the following machinery:

import odl

x_arr, backend = get_array_and_backend(x)
zeros_like = backend.array_namespace.zeros_like(x_arr)

With the array_creation module, we can have

import odl

zeros_like = odl.zeros_like(like_x_arr)

I think that the get_array_and_backend mechanism is a bit verbose for the user (but perfectly fine for us) and, most importantly, does not provide the documentation of the function in the IDE. I know you are a not a user of such features but i personnaly find it more user friendly.

In my opinion, our users overwhelmingly think in terms of arrays (just by personal experience+seeing the commits of non-core developpers) and will expect ODL to be lenient when it comes to the functions they are used to.

@Emvlt
Copy link
Contributor Author

Emvlt commented Jul 11, 2025

perform arithmetic operations between product spaces and lists so not when x and y are elements of a pspace.

Ah, like x + [4,5,6] for x∈ℝ³×ℝ³×ℝ³? Where exactly is/was that tested?

I'm not sure if it should be supported. I tend to say no, it would be better to require it to be written x + my_pspace.element([4,5,6]).
See commit 2e59ad3 :)

@leftaroundabout
Copy link
Contributor

See commit 2e59ad3 :)

Ah, now I get it! Yes, y_arr = [op(x_) for x_ in x_arr] is much better than the old version.

Emvlt added 5 commits July 16, 2025 11:57
…rward and backward calls.I found that having two functions to maintain was more error prone than having just one function with if/else logic. Also, I made sure that the cpu calls are compatible with Pytorch by doing explicit cpu conversion.
leftaroundabout and others added 30 commits October 20, 2025 13:55
Instead of having a np.float that isinstace(x, float) evaluates to True, we explicitely convert the step to a float
…calar.

In numpy-2.0, indexing into an array does not give a plain Python number but instead e.g.
`np.float64`, which is however still an `isinstance` of `float`. This situation is encountered
in some of the ODL solvers.
Changed after directory-structure reorganization.
The only real problem here was that `order` is not supported any more (as it
was NumPy-specific, not available in the Array API.
…d dtype.

This was problematic particularly for nested product spaces, and caused many
failures in the large-scale tests.
…lement of the desired space.

This avoids some complications / copying and also ensures a no-op element-generation retains the identity.
Change of the identity caused one of the large-scale tests to fail.
Specifically the non-support of calling NumPy ufuncs on ODL objects.
… different PyTorch versions.

E.g. 2.7 lacks the `device` and `copy` arguments completely, wheras 2.9 refuses to
handle inputs that do not live in the current Cuda device.

The version distinction is an ugly hack, but at least it does not rely on exception
catching (which is even more unreliable).
It is actually quite worrying that this test _succeeded_ before f67f99b. It seems
like the way DLPack-transfer was implemented then caused some elements to come
out in NumPy despite `pytorch` being selected as the `impl`.
This version does not use DLPack at all but only handles the relevant NumPy and PyTorch
cases manually.
This is slow, but that is already the case for other scenarios covered by the function.
This function is already very forgiving with respect to different types of
both input arguments, but there were still some corner cases where it errored.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants