-
Notifications
You must be signed in to change notification settings - Fork 1.5k
[ty] Synthesize precise __getitem__
overloads for tuple subclasses
#19493
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
aa99e4f
to
1d950c9
Compare
1d950c9
to
1ad721a
Compare
Diagnostic diff on typing conformance testsChanges were detected when running ty on typing conformance tests--- old-output.txt 2025-07-30 11:24:12.533913338 +0000
+++ new-output.txt 2025-07-30 11:24:12.595913773 +0000
@@ -87,6 +87,7 @@
aliases_variance.py:18:24: error[non-subscriptable] Cannot subscript object of type `<class 'ClassA[T_co]'>` with no `__class_getitem__` method
aliases_variance.py:28:16: error[non-subscriptable] Cannot subscript object of type `<class 'ClassA[T_co]'>` with no `__class_getitem__` method
aliases_variance.py:44:16: error[non-subscriptable] Cannot subscript object of type `<class 'ClassB[T_co, T_contra]'>` with no `__class_getitem__` method
+annotations_coroutines.py:27:5: error[type-assertion-failure] Argument does not have asserted type `str`
annotations_forward_refs.py:22:7: error[unresolved-reference] Name `ClassA` used when not defined
annotations_forward_refs.py:23:12: error[unresolved-reference] Name `ClassA` used when not defined
annotations_forward_refs.py:49:10: error[invalid-type-form] Variable of type `Literal[1]` is not allowed in a type expression
@@ -101,6 +102,8 @@
annotations_forward_refs.py:96:1: error[type-assertion-failure] Argument does not have asserted type `int`
annotations_generators.py:86:21: error[invalid-return-type] Return type does not match returned value: expected `int`, found `types.GeneratorType`
annotations_generators.py:91:27: error[invalid-return-type] Return type does not match returned value: expected `int`, found `types.AsyncGeneratorType`
+annotations_generators.py:167:5: error[type-assertion-failure] Argument does not have asserted type `AsyncGenerator[str, None]`
+annotations_generators.py:174:5: error[type-assertion-failure] Argument does not have asserted type `AsyncGenerator[str, None]`
annotations_generators.py:193:1: error[type-assertion-failure] Argument does not have asserted type `() -> AsyncIterator[int]`
annotations_methods.py:31:1: error[type-assertion-failure] Argument does not have asserted type `A`
annotations_methods.py:36:1: error[type-assertion-failure] Argument does not have asserted type `B`
@@ -889,4 +892,4 @@
tuples_type_form.py:36:1: error[invalid-assignment] Object of type `tuple[Literal[1], Literal[2], Literal[3], Literal[""]]` is not assignable to `tuple[int, ...]`
typeddicts_operations.py:60:1: error[type-assertion-failure] Argument does not have asserted type `str | None`
typeddicts_type_consistency.py:101:1: error[invalid-assignment] Object of type `Unknown | None` is not assignable to `str`
-Found 890 diagnostics
+Found 893 diagnostics |
e230fc4
to
e9b6681
Compare
e9b6681
to
7961ecd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much.
This is extremely cool. It makes me a little sad, because the protocol matching seems to be the only effect of this once we add the special-casing logic in infer_suscript_expression_types
for tuple subclasses. And I'm not really convinced that this is a real world concern?
If it is a real world concern, then why do we do this for tuple sublcasses only? Shouldn't the same logic apply to normal tuples as well?
Precise types for index operations are also inferred for tuple subclasses: | ||
|
||
```py | ||
class HeterogeneousSubclass(tuple[int, str, int, bytes]): ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, and in the Rust source code, I would find it much easier to have type names that I could easily associated with the index at which they appear.
Also, having worked on slicing/index math before, I know that it's easy to run into integer overflows and other fun issues, so I would appreciate if we could also add tests for lengths 0-2.
So I would probably start with a set of tests like:
class I0: ...
class I1: ...
class I2: ...
class HeterogeneousSubclass0(tuple[()]): ...
# revealed: Overload[(self, index: SupportsIndex, /) -> Never, (self, index: slice[Any, Any, Any], /) -> tuple[()]]
reveal_type(HeterogeneousSubclass0.__getitem__)
class HeterogeneousSubclass1(tuple[I0]): ...
# revealed: Overload[(self, index: SupportsIndex, /) -> I0, (self, index: slice[Any, Any, Any], /) -> tuple[I0, ...]]
reveal_type(HeterogeneousSubclass1.__getitem__)
class HeterogeneousSubclass2(tuple[I0, I1]): ...
# revealed: Overload[(self, index: Literal[-2, 0], /) -> I0, (self, index: Literal[-1, 1], /) -> I1, (self, index: SupportsIndex, /) -> I0 | I1, (self, index: slice[Any, Any, Any], /) -> tuple[I0 | I1, ...]]
reveal_type(HeterogeneousSubclass2.__getitem__)
class HeterogeneousSubclass3(tuple[I0, I1, I2]): ...
# revealed: Overload[(self, index: Literal[-3, 0], /) -> I0, (self, index: Literal[-2, 1], /) -> I1, (self, index: Literal[-1, 2], /) -> I2, (self, index: SupportsIndex, /) -> I0 | I1 | I2, (self, index: slice[Any, Any, Any], /) -> tuple[I0 | I1 | I2, ...]]
reveal_type(HeterogeneousSubclass3.__getitem__)
In fact, it's interesting to see the differences between HeterogeneousSubclass0
and HeterogeneousSubclass1
. The first one has an overload that returns Never
for all indices... which is correct. The second one has a fallback overload that returns I0
for all indices.. which is also fine, given that we'll add a index-out-of-bounds error elsewhere(?).
When I read this first test here with tuple[int, str, int, bytes]
I was first slightly annoyed by the fact that it used int
twice, until I realized that this is of course intended. So I think I would appreciate a short explanation:
class A: ...
class B: ...
class C: ...
# Note that the first and third elements have the same type:
class HeterogeneousSubclass(tuple[A, B, A, C]): ...
# revealed: Overload[(self, index: Literal[-4, -2, 0, 2], /) -> A, (self, index: Literal[-3, 1], /) -> B, (self, index: Literal[-1, 3], /) -> C, (self, index: SupportsIndex, /) -> A | B | C, (self, index: slice[Any, Any, Any], /) -> tuple[A | B | C, ...]]
reveal_type(HeterogeneousSubclass.__getitem__)
|
||
class MixedSubclass(tuple[Exception, *tuple[str, ...], int, bytes, int, range]): ... | ||
|
||
# revealed: Overload[(self, index: Literal[-3], /) -> bytes, (self, index: Literal[4], /) -> str | int | bytes | range, (self, index: Literal[2, 3], /) -> str | int | bytes, (self, index: Literal[-1], /) -> range, (self, index: Literal[1], /) -> str | int, (self, index: Literal[-5], /) -> str | Exception, (self, index: Literal[-4, -2], /) -> int, (self, index: Literal[0], /) -> Exception, (self, index: SupportsIndex, /) -> Exception | str | int | bytes | range, (self, index: slice[Any, Any, Any], /) -> tuple[Exception | str | int | bytes | range, ...]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand that we "only" generate O(tuple_length) overloads here. But I wonder if this could still lead to performance problems for large lengths, given that overload resolution will have to work with this huge set of overloads? I guess it's fine and we probably don't even need a benchmark for this, as tuple subclasses of extremely large tuples are hopefully not existent?
import stat | ||
|
||
reveal_type(os.stat("my_file.txt")[stat.ST_MODE]) # revealed: int | ||
# revealed: Overload[(self, index: Literal[-10, -9, -8, -7, -6, -5, -4, 0, 1, 2, 3, 4, 5, 6], /) -> int, (self, index: SupportsIndex, /) -> int | float, (self, index: slice[Any, Any, Any], /) -> tuple[int | float, ...]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stat_result
is based on tuple[int, int, int, int, int, int, int, float, float, float]
, so why don't we see the special overload for float
here? Because float
splits into int | float
? And int | float
is also equal to the union of all element types? Makes sense... but maybe worth a comment? Or is there another stdlib API that we could use as a more interesting example?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think stat_result
is a good one to test here, as there are several hits in the mypy_primer diff showing that people often index into it with an index <=6 and expect the type checker to understand that the result is an int
. But I'll add a comment -- and I can add some tests with "more heterogeneous" stdlib APIs too!
I should have been clearer in my PR description. While the original motivation for trying this out was to support tuple subclasses, we synthesize these overloads for the tuple type as well (not just subclasses of the tuple type), so I should probably add tests for these specifically, to ensure they don't unexpectedly break when we're further along in our protocols implementation. |
Co-authored-by: David Peter <sharkdp@users.noreply.github.com>
Yes, I was in two minds after finishing this whether it was worth the added complexity or not. I'm still not totally sure that it is. How often do people try to use tuples (or tuple subtypes) as subtypes of protocols? I'm not sure -- though I also think I'd find it surprising if I tried this as a user, and found that it didn't work! As a general principle, I think we should try to implement as much of our per-type special casing as possible via synthesized methods on the class object (where it would be sensible to do so) rather than in places like It may also still be possible to reduce the special casing we have in I chatted with Carl on Friday and he was generally supportive of the idea of something like this PR, despite the caveats. That said, I don't think he's taken a look at the actual PR branch. Overall I still weakly lean towards landing this, especially since the code it's adding is all very localised (it'll be easy to rip it out later if we end up deciding it's really not that useful after all). |
23d4746
to
0c404f8
Compare
I strongly agree with this |
Summary
This PR adds synthesized
__getitem__
overloads for tuple subclasses, such that forf
in the following example, we can infer thatf[0]
evaluates toint
andf[1]
evaluates tostr
:It was initially my hope when embarking on this PR that we would be able to use these synthesized overloads to fully get rid of the special casing we have for tuples in
infer_subscript_expression_types
. Over the course of writing the PR, I realised that this would not be possible, for the following reasons:The special casing for slice literals is too complicated to be reasonably implemented via synthesized overloads.
The synthesized overloads being added for int literals in this PR are already (I believe) the most complicated synthesized functions we have anywhere in our codebase so far. The synthesized overloads required to support slice literals would be far more complicated. It might be theoretically possible to generate them, but I think the inherent complexity makes this realistically untenable. It could also cause performance problems to synthesize that many overloads.
The index-out-of-bounds error can't be implemented via synthesized overloads.
I'd like to explore generalising this diagnostic so that we don't just emit it for specific
Type
variants that we know to have fixed lengths, but for any type whereType::len()
returnsSome()
. That's for another PR, however.For a tuple type like
tuple[int, *tuple[str, ...], bytes]
, it's impossible to express "if it's subscripted with any literal integer higher than 1, you should inferstr | bytes
rather thanint | str | bytes
using synthesized overloads.This is currently implemented in our tuple special-casing, and it would be a shame to introduce a regression on this.
Because of these issues, I've come to the conclusion that we will not be able to get rid of the hardcoded special casing for tuples in
TypeInferenceBuilder::infer_suscript_expression_types
, and that we will probably have to extend it so that it also applies to tuple subclasses. Nonetheless, I'm opening this PR anyway, because inferring precise signatures for__getitem__
attributes on specialised tuples has advantages even if we don't end up using these signatures directly when inferring the types of subscript expressions against tuples. A good example of why is protocol assignability: it would be ideal ifBar
is understood by ty as a subtype ofProto
in the following example:We currently say that
Bar
is indeed a subtype ofProto
, but (if we do not land something similar to this PR), that will no longer be the case after fixing astral-sh/ty#889. After fixing that issue, we will strictly validate the signature of a class's method against the signature of a protocol it claims to be an instance of. Without this PR, we would look up the__getitem__
signature ontuple[int]
and fallback to the generic__getitem__
signature in typeshed, which would lead us to incorrectly infer thatBar
is not a subtype ofProto
. With this PR, however, we should have the necessary pieces in place that we continue to considerBar
a subtype ofProto
even after #889 has been fixed, because the lookup of__getitem__
on theBar
class object would return the precise synthesized overloads being added here.Test Plan
Mdtests. The ecosystem hits also LGTM. They use
os.stat()
andpwd.getpwuid()
, both of which return instances of tuple subclasses.