-
Notifications
You must be signed in to change notification settings - Fork 151
Add SmallTag
type for more compact Dual
types
#748
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
SmallTag
type for more compute Dual
typesSmallTag
type for more compact Dual
types
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #748 +/- ##
==========================================
- Coverage 89.33% 86.86% -2.48%
==========================================
Files 11 11
Lines 1013 1043 +30
==========================================
+ Hits 905 906 +1
- Misses 108 137 +29 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
We could also revive #572, closed it was thought that SciML printing problems were solved in Base. Can you comment on the comparison? |
Base has type-folding now that helps many stack traces, but you still sometimes need a
The main difference is that debugging perturbation issues without the actual function / array types can be very confusing, so it seems useful to keep the original functionality around. Other than that, the implementation should be mostly equivalent (the hash is based in many cases on the Also it's worth mentioning that |
98b57bb
to
d88bf58
Compare
Chatted on this with @oscardssmith and @topolarity . This is better than using the objectid because it is consistent with respect to precompilation, i.e. in a new session the objectid can change depending on the order that you evaluate functions, while this is consistent. That plus on v1.11 this is const eval-able, and thus using this form of a short tag is type-stable and makes it so you will precompile the autodiff calls if you autodiff w.r.t. the same function. With those two properties though, I would be highly in favor of defaulting to this on v1.11+, since it won't hurt performance or precompilation, and I cannot see a scenario where a normal user would favor the tags based on function types. It would be odd to have huge multi-screen types on v1.10 and then good stack traces on v1.11, but IMO that's not a reason to make it wait longer. I'd like to hear @KristofferC's opinion here. But either way, I think we'd want to set this up as well in ADTypes and DifferentiationInterface @gdalle |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not really convinced that #724 is a proper fix for the issue it claims to solve - it is still possible for the counter to overlap, e.g., between multiple precompiled packages. I am happy to update the type exports while we're at it though |
I don't feel confident enough to evaluate that one, but I thought it looked related to what you did with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a full review but some comments:
-
If we want to keep two methods of tagging around, is this the right way to access them? It seems quite an obscure internal detail, which should never change any results. So perhaps need not be a new keyword on every user-facing function.
-
I wonder if the implementation can be much simpler, like just storing the integer as the tag, instead of adding more structs.
src/config.jl
Outdated
@inline function ≺(::Type{SmallTag{H1}}, ::Type{SmallTag{H2}}) where {H1,H2} | ||
tagcount(SmallTag{H1}) < tagcount(SmallTag{H2}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be obvious, but can you explain a bit why this is designed this way? Why calculate hash H
and store that in the type, but call the generated tagcount(SmallTag{H1})
to get a different integer whenever you use it. Why not store the integer from TAGCOUNT
as a type parameter directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was just to preserve the current ordering behavior for tags as closely as possible - the tag hashes are not sorted chronologically, so they would make a very different order
Also, am I correct that this doesn't make perturbation confusion impossible, only vanishingly unlikely? |
@gdalle that is correct. With a 64 bit hash, we have a 50% collision chance once you get to 4 billion different tags (that are used in the same operation), but as long as your number of tags is below millions (which should always be the case), the chance of confusion is incredibly unlikely |
IMO we should merge this. |
Would someone care to answer this, and related comments above?
|
If someone happens to be doing ForwardDiff of the same function against a few million different arguments at the same time, then there would be an extremely small probability of a perturbation confusion. |
No I don't think it's related. This PR just uses the current way for the tag incrementer, which could potentially have an issue with some odd multithreading cases, but that PR also isn't a proper fix. But it's pretty unrelated since the current code already has this issue, and so there's no difference in this respect between default and smalltag, just whomever does end up wanting to do something about the multithreading issue will need to hit these two functions. I don't think that's a major maintenance burden to stop a PR like this. |
This was heavily discussed with about 30 people at JuliaCon. Basically I made a "SmallTag triage" JuliaCon Hackathon discussion. Everyone was really excited that there was a fix that was already made and the question was "why is this not merged"? The general consensus about certain topics in here was:
So with this, I think we have answered @gdalle's questions (mostly seemed to just be questions rather than concerns) and @topolarity is making some of these final changes based on @mcabbott's requests, and then two JuliaDiff maintainers (@oscardssmith and I) are sitting together with @topolarity and the community and when it all seems good plan to merge. |
The three of us agreed that using a tag type is more clear and less likely to run into a potential issue. One potential issue is someone putting a type parameter in the wrong spot, so trying to specify a chunksize=1 and instead accidentally setting a tag to 1, and then opening themselves up to perturbation confusion. So using a type and UInt just is a nice safety measure. We note that the exact tag was considered by everyone to just be an implementation detail and not public API anyone should be relying on, so it wouldn't be a breaking change to change this to just an Int in the future. That could be a discussion we can continue, but it doesn't seem like there was much support for that idea. |
@topolarity noted that v1.10 cannot compute a stable hash at compile time, and this means that these hashes cannot be inferred. Because of this, it would not make a good default on v1.10. It should be considered to be removed after the next LTS bump, but as mentioned in the previous comment, the general Julia user did not believe that we should wait for another LTS in order to make this the default and so we would need to live with the two separate defaults for at least the short term future. |
The reason for the keyword argument is due to the potential of hash collision, this gives a method to flip to the version for which it's debuggable without that potential issue. Though as mentioned, it's vanishingly unlikely, but we should at least move with some care to start and give people an easy way to diagnose if it does cause an issue. |
Defaulting to this thing (which really still ought to be called Mainly targeting 1.11+ seems fine by me too. If there really has to be a user-facing keyword, it can be
Sounds OK? I don't really see the argument being made against this. There's lots arguing against opt-in via Preferences (which I follow) and some complaint that it's more work making the PR (but always better now than later). |
I don't care what the keyword is. IMO it would be fine if this wasn't changable. The argument against preferences is that the person doing the autodiff should have control rather than making it a global setting. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not approved without further discussion.
Can you explain what you mean, who are these multiple people? You are envisaging that libraries which call ForwardDiff may want to choose not to use the hash one on their internal calls, even if their end users do? But I don't see why. If this thing is safe enough to turn on for everyone, and I buy that it is, then why must we inflict clutter on them to make it easy to turn off? If it's not so safe and is an experimental feature, that's different. |
Changed to HashTag.
So you want it a bool? Let's just figure out what you want. This is going in circles. Should we just get on a call? I can now make it a bool if that makes you happy.
Its safe for everyone. I don't know what you're getting at. |
It's not going in circles. I'm against clutter, especially user-facing clutter. If this is safe for everyone, then I don't understand the desire for every user-facing function to gain a new keyword to control it, and every user-facing docstring to gain a description of this new keyword's states. Leaving an obscure way to change the internal behaviour back in place seems preferable to that. |
This was the other PR, which never merged because of relocatability issues and inference on v1.10, so that's a no.
This one is a no and was already removed.
This is what the PR doing. Is that fine? |
And I have also added a Preferences version. So from the list:
I am still confused which one you want, but we now have an implementation of all 4 versions. Please let me know which version you prefer and we can just take that commit. |
Firstly, stuff like
is not a great way to go about things. It puts unfair pressure on the maintainers and it is basically emotional manipulation to merge something because you don't want to be the party pooper against all those "30 people" in a room cherring in champagne about how great this PR is. Secondly, this PR is really hard to review, it has like a gazillion of Update commits etc. If you want to work on this on the hackathon I suggest first cleaning up the PR, make the commits nice etc and then we can take another shot at looking at this objectively. |
c62e14f
to
5ef9860
Compare
It was a moving target. First I was told that using the keyword argument
Which do you want? We will just do that and then squash. But if every single time the PR is updated it's changed to another then it's going to be a mess. So please let me know which you would be happy with and we will squash to that version. |
Bump @mcabbott is this the form you think is optimal? |
I do think it's worth discussing the pros and cons as @mcabbott suggested There's no need to rush this PR - IIUC we're on the same page that we probably want a change like this, so we just have to discuss and convince ourselves of the details (which seems fair to me) Tackling that:
My personal preference is a combination of (4) and (2), but (3) seems OK too I expect the majority of end users (>90%) will not change this setting, via Preferences or otherwise, so I agree we need good defaults. The hash is probably not a good default on 1.10, because it unfortunately makes the tag constructor type-unstable. However, it does seem like a good choice on 1.11+ My feelings are less strong about the user config piece (happy to defer to your judgment @mcabbott ), but I'm generally reluctant to add preferences for things that are likely to be "hidden" concerns for an end-user. In this case, a lot of indirect users of ForwardDiff exist that have no preference / understanding for the perturbation system / issue, so it seems strange to expect them to know that this preference exists and that they will set it. Moving it to preferences also makes it impossible for a package doing its own AD calculations to have an opinion about perturbation tracking (even if, e.g., that package hits a perturbation bug of some kind and would like to be able to code in a fix for its users) All of that said, it's true that we mostly expect these two options to have similar perturbation tracking capability. I can definitely see the argument that the preference really only affects display / diagnostics, which is arguably in the end user's "domain" Anyway I'm OK with either direction for config Hopefully that at least clarifies the original intent before this PR was hackathon'd |
Maybe the way to see it is that the kwarg was trying to simplify the existing "tag control" in the API. That way, a direct user of ForwardDiff only has to choose from a set of options, instead of constructing a tag themselves like they'd do now if they want to tweak tag behavior. If we'd prefer the original interface though that's fine - packages can still pass in the other tag type if they really want to control this aspect of tag behavior |
@mcabbott any comments? |
Bump |
merging by the end of the week sans further objections. |
I don't think this PR should be merged in its current state. Such a fundamental change should be documented properly, and currently this PR is missing updates of the documentation (as an example, the preference More generally, I'm actually not fully convinced that |
The only reason there hasn't been a conclusion here is that you have refused to allow one to occur. Chris, Cody and I are all fine with any of the 4 possible options, and have trying to get you to express which one if any you think is the right path forward, because we all agree that any variant of this is better than status quo. This PR has been in limbo for ~2 months waiting for consensus. I also don't think the risk of perturbation confusing is significant. Even with 16k different tags in the same calculation (which never happens), the perterbation confusion risk is <1e-10 (and for a still completely unreasonable 1024 tags, it's at <1e-13). Sure, shorter stacktraces would be nice, but those realistically can't happen until 1.13 at the earliest (6 months away), and from discussion on triage calls, it seems unlikely that this is even possible to fix fully at the language level in the first place. |
This is the first time @devmotion has chimed in in this thread. Let's please be kind to our maintainers. They just want to make the right decision here.
Like Oscar says, my hope is that from a statistical POV, 64-bit hash collisions are rare enough not to matter to users, but I understand wanting to make this opt-in until it's more mature. I am definitely OK with it not being the default - the original version of this PR left it in the
I don't really agree here. From my POV, there are only two ways to make stack traces smaller:
I think Julia needs to improve on both if stack traces are to be made manageable (a single large type can blow out a stack trace, as can many frames with very simple types). We already have a version of (1) in Base that works quite well, but it is disabled for e.g. CI workloads where you do not want to approximate types. In that kind of context, there is a direct trade-off between type complexity and stack trace verbosity, so I do think decreasing type complexity is important here. |
@devmotion Can you make it clear which of the four is your preference? I'd be happy to just default it to false here and enable it downstream if that makes it easier to agree to. I didn't document it because I assumed that this was a developer detail for testing long tag form. Since there are other PRs which were closed that were documenting the Dual number seeding and tagging systems which were closed due to not being public API, documenting this would be inconsistent with some past decisions here, but I'd be happy to write something up if that is the direction we wish to go nowadays. For the record, I still think the whole dual system should just be documented. And I think we should talk about the 99.999% scenario. Are there programs that have more than 3 concurrent dual numbers running in the same program? If there is, send me a link because they must be doing something cool 😁 (or most likely it's bad higher order and should use TaylorDiff). But the key though is that if that is the case, for all of the cases that aren't that, there's a "if you turned every atom in the universe into a computer and a version of this computation you would still find it to be extremely unlikely that any will hit a hash collision" argument (@oscardssmith can probably quantify it). |
Would people be okay with a 128 bit hash? Could you be clear what your probability threshold requirement is otherwise? |
Let's keep the right frame of reference here. This aims to make a different tradeoff for the printing of errors, shorter but less informative -- a cosmetic change. Yet it replaces deep internals which have worked just fine for a long time, adds complexity, and has the potential to introduce exciting new bugs. That means it needs careful review. But it also means it isn't a high priority for anyone involved. The safe path here is to do nothing. "limbo for ~2 months" is hardly the worst thing. I mean long stack traces have been a complaint for... almost a decade? It's not wrong to want to work on them but let's not manufacture fake urgency. This package is far from the only cause, and this PR will not fix all such complaints. IMO the goal should be to make things really easy and pleasant for (say) Kristoffer to review -- as suggested above. Like maybe the first message can expand to describe really clearly how this thing works, what exactly |
If the concern is that the code is not mature (seems fair to me - it's brand new), can we put it behind a feature flag of your preference so that the SciML ecosystem can help provide us some testing in the wild? The |
IMO this is completely backwards. It's completely impossible to have a perfect clean mergable PR if the interface is up in the air. Chris and I are both maintainers on this repo, and are fully capable of reviewing/merging here and we are both more than happy to do that if you are short on time for a comprehensive review. As such, the only thing holding this up is consensus on the UI of this PR. As such, I propose we remove the preference mechanism here for maximal simplicity (since that seems to be one of your primary goals here). That way, this PR will have no user facing changes (other than the shorter stacktraces if there is an error in unrelated code). Is that acceptable to you? If not, what version of this PR would be? I understand this PR is not a priority for you, but as someone who spends ~hours a day looking at the stacktraces in question, this is a high priority for me. |
3fc32e6
to
5c8a02c
Compare
This is an alternative to `Tag` that provides largely the same functionality, but carries around only the hash of the function / array types instead of the full types themselves. This can make these types much less bulky to print and easier to visually scan for.
As requested by @oscardssmith.
5c8a02c
to
6da76e6
Compare
This is largely to improve printing and reduce visual noise for packages with large function types (e.g. SciML)
Compare
tag = :small
:to
tag = :default
: