-
Notifications
You must be signed in to change notification settings - Fork 1.6k
[red-knot] Decompose bool to Literal[True, False] in unions and intersections
#15738
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
a347b0d to
ad54eba
Compare
8b6a5fd to
b32181d
Compare
This comment was marked as resolved.
This comment was marked as resolved.
b32181d to
0ff14bf
Compare
| finally: | ||
| # TODO: should be `Literal[1] | str | bytes | bool | memoryview | float | range | slice` | ||
| reveal_type(x) # revealed: bool | float | slice | ||
| reveal_type(x) # revealed: bool | slice | float |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The order here changes because I'm using swap_remove in Type::display to ensure that Literal[True, False] is displayed as bool to users. But maybe it's okay to use remove() in Type::display() rather than swap_remove? remove() is O(n) rather than O(1), but displaying a type shouldn't really be performance-sensitive: it's usually only done as part of printing a diagnostic to the terminal
| class P: | ||
| def __lt__(self, other: "P") -> bool: | ||
| return True | ||
|
|
||
| def __le__(self, other: "P") -> bool: | ||
| return True | ||
|
|
||
| def __gt__(self, other: "P") -> bool: | ||
| return True | ||
|
|
||
| def __ge__(self, other: "P") -> bool: | ||
| return True | ||
|
|
||
| class Q(P): ... | ||
|
|
||
| def _(x: P, y: Q): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed this because it doesn't look like we've implemented comparisons between unions inside tuples yet, and bool is now internally represented as a union, which meant that many reveal_type assertions in this test became TODOs rather than bools. Testing whether we knew how to compare a bool with a bool didn't seem to me to be the point of the test.
| KnownClass::Bool.to_instance(db).is_subtype_of(db, target) | ||
| KnownClass::Int.to_instance(db).is_subtype_of(db, target) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a stack overflow for a looong time here before I realised I had to change this... was tearing my hair out 🙃
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you say more why this is necessary? And maybe add a comment (unless I'm missing something obvious)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first step in Type::is_subtype_of() is now (with this PR) to normalize bool to Literal[True] | Literal[False] for both self and target. So if self is bool, self.is_subtype_of(target) delegates to whether Literal[True] | Literal[False] is a subtype of target. That takes us to the Type::Union branch: a union type is a subtype of target iff all its members are a subtype of target. So then we try the first member of the union Literal[True] | Literal[False], and ask the question: "Is Literal[True] a subtype of target? If target is not equivalent to Literal[True], that takes us to this branch, and this branch says "Literal[True] is a subtype of target iff bool is a subtype of target". So then we ask the question "Is bool a subtype of target?", which is the very first question we started off with. And bool is normalized to Literal[True] | Literal[False]... the cycle repeats indefinitely, resulting in infinite recursion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
I still don't understand why it's correct though? There are types which are supertypes of bool, but not of int. bool | None or ~Literal[2], for example. So why is it correct to replace the bool <: target check with a int <: target check? Because all special cases have been handled above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still don't understand why it's correct though?
It might not be -- I have been known to make mistakes 😜
There are types which are supertypes of
bool, but not ofint.bool | Noneor~Literal[2], for example. So why is it correct to replace thebool <: targetcheck with aint <: targetcheck? Because all special cases have been handled above?
Yeah, I think our set-theoretic types such as unions and intersections should be handled in the branches above? I just added more test cases demonstrating this.
(I will add a comment to the code reflecting this conversation, but first I'm experimenting with alternative ways of structuring the code that might make this clearer in the first place!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(#15773 is the "alternative way of structuring the code", for anybody else reading this thread)
0ff14bf to
6991676
Compare
One solution to this might be to add a #[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)]
pub struct NormalizedType<'db>(Type<'db>);
impl<'db> NormalizedType<'db> {
/// Return true if this type is a [subtype of] type `target`.
pub(crate) fn is_subtype_of(self, db: &'db dyn Db, target: Self) -> bool {}
/// Return true if this type is [assignable to] type `target`.
pub(crate) fn is_assignable_to(self, db: &'db dyn Db, target: Self) -> bool {}
/// Return true if this type is [equivalent to] type `other`.
pub(crate) fn is_equivalent_to(self, db: &'db dyn Db, other: Self) -> bool {}
/// Returns true if this type and `other` are gradual equivalent.
pub(crate) fn is_gradual_equivalent_to(self, db: &'db dyn Db, other: Self) -> bool {}
/// Return true if this type and `other` have no common elements.
pub(crate) fn is_disjoint_from(self, db: &'db dyn Db, other: Self) -> bool {}
}
impl<'db> Deref for NormalizedType<'db> {
type Target = Type<'db>;
fn deref(&self) -> &Self::Target {
&self.0
}
}
impl<'db> From<NormalizedType<'db>> for Type<'db> {
fn from(value: NormalizedType<'db>) -> Self {
value.0
}
}This is quite attractive in some ways. It would be a very big change to make, though, and it would make our API more awkward to use in several places. I'm not sure it would be worth it. |
126aaaf to
7318ff9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I suggested normalizing bool to Literal[True, False], and I appreciate you exploring what that would look like! But having seen what it would look like, it doesn't feel to me like the right direction.
The cases where this simplification applies are very narrow: unions of bool (or an enum, in future -- I'm going to call these sealed types) with a larger type that only partially overlaps with bool (or the enum). Since bool and enums are both very narrow types, with a small/fixed set of singleton inhabitants, there are very few larger types that will partially overlap. Some types (int or object) will fully overlap (they are super-types), but I think in fact AlwaysTruthy and AlwaysFalsy are the only partially overlapping types?
So while decomposing is attractive in its conceptual simplicity (and in its code simplicity!), it means lots of operations on bools in unions become more expensive, because we have to check against two types instead of one (and this will be even worse with enums, possibly making the approach not even workable for them at all.) And then in the effort to limit some of that inefficiency (by decomposing only in unions/intersections), we lose a fair bit of the conceptual and code simplicity. So we are kind of letting the tail wag the dog, by paying extra perf cost for all uses of bool intersections, in order to support the less-common case where Always{Truthy,Falsy} is also in that union.
I think we do want to normalize these cases (by which I mean, where a sealed type partially overlaps with another type in a union) such that we decompose the sealed type and eliminate the overlap (that is, AlwaysFalse | Literal[True] instead of AlwaysFalse | bool). This seems better than trying to encode the equivalence in is_equivalent_to.
But given that the cases where this applies are so narrow, and we already compare every new union element to every existing union element, I think we can do this as (yet another) special case in UnionBuilder, and that will perform better than always decomposing.
The rule would be that whenever we see bool in a union with a type that it is not disjoint from but also not a subtype of (I phrase it in this general way so that we also handle intersections containing Always{Truthy,False} being in a union with bool), we replace bool in that union with its literal inhabitant that is disjoint from the other type.
WDYT?
I think this could work! I tried something like what you outline above initially, and struggled to get it to work, but I think at that stage I still didn't understand the full extent of the problem here. Revisiting it might yield different results now that I am Older and Wiser.
I agree that we need some kind of normalization here rather than attempting some ugly (and expensive) special casing in |
|
I think there might be a more general problem here where we should understand the equivalence I understand that there is a desire to always find a canonical representation, but I'm afraid this might be too much to ask for in a set-theoretic type system? Enforcing non-overlapping union elements can be achieved, but there are many ways to do that — the two variants It is obviously desirable to have a "simple" and unique representation when we present types to users, at least in common cases. But as a general property, I'm not sure if that is achievable. |
We've had some PRs that move us in this direction, but I think this is the key question we need to answer — do we want
...because if so, you would typically avoid that combinatorial explosion by ensuring that the primitives in the normal form are all disjoint. In this case, it's |
I was mostly thinking about non-primitives, such as unrelated classes class C1: ...
class C2: ...
...
class Cn: ...How would you bring the union |
|
Yes, (one) problem with trying to avoid all overlaps is that a typical nominal instance type (perhaps the most common type) is not disjoint from another unrelated one, thanks to subclassing and multiple inheritance. So avoiding all overlaps in unions seems impractical. I agree that I'm not sure it's feasible in general to have a single normalized representation of every type. (It is important that we maintain DNF in the sense that we always normalize to a union of intersections, never a more complex tree of unions and intersections. But that doesn't necessarily imply a fully normalized form in terms of the possible combinations of elements.) My suggestion above was much more narrow and targeted: that we'd avoid overlaps with sealed types and prefer decomposing the sealed type instead. I think this is more tractable. But Alex makes a good point that this might be the wrong direction; perhaps we should prefer the overlap and use the full sealed type whenever it is correct to do so.
To me this looks like a case where we'd ideally prefer to achieve this understanding via an eager simplification of |
|
I just realized from looking at the overloads spec PR that the pattern you started introducing in this PR as |
|
Just checking in. Is this something that we still plan on merging/changing in the future? |
I've deprioritised this issue for now as it's not urgent, but it's still something we'll need to fix at some point, and I still think something like this could be a viable fix. I'd rather not close this just yet. |
Summary
Our union builder currently has a problem due to a bad interaction of several features we've implemented:
bool ≡ Literal[True, False](and therefore that for anyT,bool | T ≡ Literal[True, False] | T; therefore, whenever it seesLiteral[True, False]in a union, it eagerly simplifies this union to the simpler typebool.Uto the unionS | Tis a no-op ifUis a subtype of eitherSorTLiteral[True]is a subtype ofAlwaysTruthyand~AlwaysFalsyLiteral[Falsy]is a subtype ofAlwaysFalsyand~AlwaysTruthyboolis neither a subtype ofAlwaysTruthynor a subtype ofAlwaysFalsyPutting all these features together means that the union builder can produce different unions depending on the order in which elements are inserted into the union. Consider a union consisting of
Literal[True],Literal[False]andAlwaysTruthy. If they're inserted in that order, then the unions is built like this:But if they're inserted in a slightly different order, a different union is constructed entirely, which red-knot is not capable of understanding to be equivalent to the one above:
There are the following set of equivalent type pairs that can currently be constructed under our current model but are not understood to be equivalent by red-knot:
AlwaysTruthy | bool ≡ AlwaysTruthy | Literal[False]AlwaysFalsy | bool ≡ AlwaysFalsy | Literal[True]~AlwaysTruthy | bool ≡ ~AlwaysTruthy | Literal[True]~AlwaysFalsy | bool ≡ ~AlwaysFalsy | Literal[False]This PR refactors the union builder to ensure that unions are never constructed that contain
boolas an element; instead,boolis always decomposed toLiteral[True, False]. The same is done for our intersection builder. Doing this makes it much easier to ensure that unions such as the ones above always have the same set of elements no matter which order they are inserted into the union; a lot of complexity is removed frombuilder.rsin this PR. However, it has the (significant) drawback that in various type-relational methods such asType::is_equivalent_to(),Type::is_subtype_of()andType::is_assignable_to(), we have to remember to normalizeboolintoLiteral[True, False]before comparing the type with any other type, since the other type might be a union, and we must ensure thatLiteral[True, False]is understood to be equivalent tobool.Since it would be confusing for users if
boolwas displayed asLiteral[True, False], some logic is added toType::displayso thatLiteral[True, False]is replaced withboolin unions before unions are printed to the terminal in diagnostics.This PR fixes astral-sh/ty#216, and allows us to promote two flaky property tests to stable. However, it currently shows up as having a 1% performance regression on both red-knot benchmarks on codspeed: https://codspeed.io/astral-sh/ruff/branches/alex%2Ftruthy-unions-5.
What about literal strings?
As well as the above invariants for unions containing
AlwaysTruthy/AlwaysFalsyandbool, there are also equivalences forLiteralStringthat we do not yet understand. The fact that we do not understand these is at least partly responsible for the flakiness of several other property tests that are not marked as stable:AlwaysTruthy | LiteralString ≡ AlwaysTruthy | Literal[""]AlwaysFalsy | LiteralString ≡ AlwaysFalsy | LiteralString & ~Literal[""]~AlwaysTruthy | LiteralString ≡ ~AlwaysTruthy | Literal[""]~AlwaysFalsy | LiteralString ≡ ~AlwaysFalsy | LiteralString & ~Literal[""]I'm not sure if this problem is solvable in the same way without adding a new
Type::TruthyLiteralStringvariant. Anyway, this PR does not attempt to solve this problem. Instead, some failing tests are added with a TODO.Test Plan
mainQUICKCHECK_TESTS=1000000 cargo test --release -p red_knot_python_semantic -- --ignored types::property_tests::stableQUICKCHECK_TESTS=5000000 cargo test --release -p red_knot_python_semantic -- --ignored types::property_tests::stable::union_equivalence_not_order_dependentQUICKCHECK_TESTS=3000000 cargo test --release -p red_knot_python_semantic -- --ignored types::property_tests::stable::intersection_equivalence_not_order_dependent