Propagate to exact types in struct-utils.h #7889

tlively · 2025-09-06T07:28:30Z

Key collected struct information on both heap type and exactness,
allowing queries for exact types to return more precise results than
queries on the corresponding inexact types.

Use this to fix a bug in CFP where it failed to take into account
exactness and would unnecessarily and incorrectly emit a select between
two values of different types where a single exact type was expected.

Also update GTO to propagate to exact types even though it does not take
advantage of them. This is necessary because the FieldScanner now
collects information for exact and inexact types separately and they
need to be combined.

Key collected struct information on both heap type and exactness, allowing queries for exact types to return more precise results than queries on the corresponding inexact types. Use this to fix a bug in CFP where it failed to take into account exactness and would unnecessarily and incorrectly emit a select between two values of different types where a single exact type was expected. Also update GTO to propagate to exact types even though it does not take advantage of them. This is necessary because the FieldScanner now collects information for exact and inexact types separately and they need to be combined.

kripken · 2025-09-06T14:25:55Z

src/passes/ConstantFieldPropagation.cpp

      }

-      auto iter = rawNewInfos.find(type);
+      auto iter = rawNewInfos.find({type, Exact});


rawNewInfos is always exact, though. This is what my last revision in the other PR uses directly.

They are always exact because they are info from struct.new, which always has an exact type.

To be clear, I think the idea in this PR is good! I'll read the code in detail monday.

But my point is that CFP already has exact info, in the form of rawNewInfos. We can probably remove that entirely, in this PR. And for the immediate fix, the latest version of my other PR is just a few lines, basically simply using rawNewInfos in a second place.

I'll have follow-up PRs removing rawNewInfos and making a few other improvements to CFP.

kripken · 2025-09-08T16:17:24Z

src/ir/struct-utils.h

 // depends on the underlying T defining a combine() method.
 template<typename T>
-struct StructValuesMap : public std::unordered_map<HeapType, StructValues<T>> {
+struct StructValuesMap


Worth explaining in the comment what the meaning of the mapping is, that is, what do the exact and inexact entries mean.

kripken · 2025-09-08T16:18:23Z

src/ir/struct-utils.h


+  StructValues<T>& operator[](HeapType type) {
+    return (*this)[{type, Inexact}];
+  }


Should we disallow this operator? Or, are we not risking users getting the less-precise version by default?

The idea here was to allow other users of these utilities to keep using them as-is without changes. Otherwise there would be a lot of mechanical changes here just adding Inexact in a bunch of places in various passes.

kripken · 2025-09-08T16:19:09Z

src/ir/struct-utils.h

    auto& fields = heapType.getStruct().fields;
-    auto& infos = functionNewInfos[this->getFunction()][heapType];
+    auto ht = std::make_pair(heapType, Exact);
+    auto& infos = functionNewInfos[this->getFunction()][ht];


I wonder if we can remove functionNewInfos entirely?

Yes, probably. I can look at this in a follow-up.

CFP takes advantage of exact type information, but it currently does so only for immutable fields. It is also unnecessarily conservative about how it propagates type information so that sets to a type inhibit optimizations of its sibling types, even though those sets cannot possibly affect the siblings. Add tests for these cases to demonstrate the benefit of follow-on PRs that will fix these issues.

src/ir/struct-utils.h

kripken · 2025-09-09T14:48:23Z

src/ir/struct-utils.h

  // account fields.
  void propagateToSuperTypes(StructValuesMap<T>& infos) {
-    propagate(infos, false, true);
+    propagate(infos, false, true, true);


Why is this defaulting to exact while the others have new variants for that, and default to inexact?

Propagating up never propagates to exact types, so its behavior does not depend on the includeExact flag. Even though they would have identical behavior, I think true is a little more honest than false because it can still propagate from exact types.

Propagating up never propagates to exact types,

But propagating down does? Why is there this asymmetry?

(What I feel might be happening is that this class left "reasoning" to the users, that is, it provided low-level "propagate here or there", and the user was responsible for picking which made sense for their use case. It's not obvious to me why this is asymmetrical so I wonder if this is anticipating certain use cases. If so that might be fine but could be documented perhaps.)

Because exact types are subtypes of their inexact counterparts, but have no subtypes themselves no matter how many subtypes their inexact counterparts have. This entirely due to the structure of the type relationships, not for any particular use cases.

Oh, I see, so includeExact means literally "include Exact when looking at subtypes", not "take exactness into account when propagating"?

Yep, pretty much.

Sounds good, perhaps a comment with that? The wrong meaning was my initial impression.

Co-authored-by: Alon Zakai <azakai@google.com>

kripken · 2025-09-09T16:23:08Z

src/ir/struct-utils.h

+        std::vector<std::pair<HeapType, Exactness>> subs;
+        if (includeExact) {
+          subs.emplace_back(type, Exact);
+        }
        for (auto subType : subTypes.getImmediateSubTypes(type)) {
-          auto& subInfos = combinedInfos[subType];
+          subs.emplace_back(subType, Inexact);
+        }


Suggested change

std::vector<std::pair<HeapType, Exactness>> subs;

if (includeExact) {

subs.emplace_back(type, Exact);

}

for (auto subType : subTypes.getImmediateSubTypes(type)) {

auto& subInfos = combinedInfos[subType];

subs.emplace_back(subType, Inexact);

}

auto subs = subTypes.getImmediateSubTypes(type);

if (includeExact) {

subs.emplace_back(type, Exact);

}

Though even this is not ideal, as it always copies the output of subTypes.getImmediateSubTypes... but maybe that's ok?

I can also factor the loop body into a lambda to avoid any copying at all.

kripken · 2025-09-09T16:26:47Z

src/passes/ConstantFieldPropagation.cpp

    StructUtils::TypeHierarchyPropagator<StructUtils::CombinableBool>
      boolPropagator(subTypes);
-    boolPropagator.propagateToSubTypes(combinedCopyInfos);
+    boolPropagator.propagateToSubTypesWithExact(combinedCopyInfos);


The comment above explains in detail why we propagate as we do, and should be updated for exactness.

Perhaps something like "when we read from an exact reference, the value at its field may be influenced by writes to non-exact super- and subtypes."

Let's address this in follow-on PRs that remove the false distinction between values written by news and values written by sets. The explanation of the analysis can be made much simpler and more rigorous once we simplify the analysis itself.

kripken · 2025-09-10T00:02:30Z

src/passes/GlobalTypeOptimization.cpp

+    propagator.propagateToSuperAndSubTypesWithExact(dataFromSubsAndSupersMap);
    auto dataFromSupersMap = std::move(combinedSetGetInfos);
-    propagator.propagateToSubTypes(dataFromSupersMap);
+    propagator.propagateToSubTypesWithExact(dataFromSupersMap);


Same issue as before - why do we want exactness here? On the one hand it seems obvious we do, but it isn't the default - why not?

And, I see no test changes from this - if it doesn't help, why do it?

Since the FieldInfoScanner now writes the info it discovers at a mix of exact and inexact keys, we now need to propagate information to the exact keys as well to ensure all the data is properly combined.

Makes sense, but that sounds like it applies to all users of the utility? (why do it here and not in all others?)

kripken · 2025-09-10T00:04:10Z

src/passes/GlobalTypeOptimization.cpp

-      auto& dataFromSupers = dataFromSupersMap[type];
+      auto ht = std::make_pair(type, Exact);
+      auto& dataFromSubsAndSupers = dataFromSubsAndSupersMap[ht];
+      auto& dataFromSupers = dataFromSupersMap[ht];


I must say, I find it confusing to see "use an Exact reference, but look in data that was propagated from subs and supers". Those feel in contradiction. What's the right way to look at this?

Concretely, I'm not sure why this Exact should not be InExact.

An (exact $Foo) is a subtype of $Foo and is propagated to and from just like any other subtype of $Foo. Exactness fits right into the existing super and subtype relationships, so there is no contradiction between talking about exactness and talking about super- and subtypes.

Concretely, this needs to be Exact because the inexact keys no longer have all the data in dataFromSupersMap. The field scanner now writes data to a mix of exact and inexact keys. This doesn't matter for dataFromSubsAndSupersMap because it propagates up and down simultaneously, ensuring the exact and inexact entries for a given heap type have the same data. But for dataFromSupersMap, the inexact entry has never been joined with the data in the exact entry since propagation has only gone down.

I see, thanks.

That makes sense, but this does seem more complicated than before. Continuing my comment from a moment ago, can we

Make the propagation always go into exact types? As you say, they are just a normal part of subtyping. So there shouldn't be a need to oddly avoid propagating them.

That the inexact keys don't have all the data, and that the solution is to query with an exact ref, seems surprising, in the sense that I would expect the point of "querying with an exact ref" to be "get exact results". Can we perhaps add an API to query a heap type without specifying exactness, and that will get all the data (internally, using an exact reference if that makes sense in the concrete data structure, but my point is that the query is just trying to get all the propagated info for the heap type).

Another solution to limit the spread of this complexity would be to revert the shared changes and so something bespoke in CFP. I already have further changes to how copies are noted in the works, and that also only benefits CFP. I could also get some nice performance gains by doing a custom propagation algorithm for CFP, so maybe it makes sense not try to share any of that with other passes.

Interesting, how would the custom propagation differ from the existing?

Though it seems CFP has fairly general needs, so I would hope we can keep sharing logic between it and other passes?

As we discussed offline, I will remove the complexity of trying to sometimes include exactness and sometimes not. We will always make exactness explicit for all users. As we also discussed, the problem with suggestion (2) is that the fully collected information for a particular heap type might be in its exact or inexact entry (or both) depending on which direction the information was propagated in.

tlively · 2025-09-12T21:50:54Z

@kripken, I've made the changes we discussed offline. PTAL!

kripken · 2025-09-12T22:35:09Z

src/passes/GlobalTypeOptimization.cpp

-      auto& dataFromSupers = dataFromSupersMap[type];
+      // Use the exact entry because information from the inexact entry will
+      // have been propagated down into it but not vice versa.
+      auto ht = std::make_pair(type, Exact);


The comment makes sense for dataFromSupersMap which was propagated using propagateToSubTypes - only down. But what about dataFromSubsAndSupersMap which was propagated both ways?

If it's propagated both ways, then the exact and inexact entries are the same. I'll add that to the comment.

kripken · 2025-09-12T22:40:42Z

src/passes/TypeRefining.cpp

        auto oldType = fields[i].type;
-        auto& info = finalInfos[type][i];
+        // Use inexact because exact info will have been propagated up to
+        // inexact entries but not necessarily vice versa.


We propagate in both directions though?

binaryen/src/passes/TypeRefining.cpp

Line 177 in 1d61606

propagator.propagateToSuperAndSubTypes(combinedSetGetInfos);

Oh, it looks like all our (3) propagations include supertypes. So we do always go up.

tlively requested a review from kripken September 6, 2025 07:28

tlively mentioned this pull request Sep 6, 2025

[Custom Descriptors] Fix CFP on an exact ref.get_desc #7886

Merged

kripken reviewed Sep 6, 2025

View reviewed changes

kripken reviewed Sep 8, 2025

View reviewed changes

tlively added 8 commits September 8, 2025 13:51

Merge branch 'main' into exact-cfp-fix

df76cfe

Merge branch 'main' into exact-cfp-fix

f35f60d

Merge branch 'cfp-missing-opts-tests' into exact-cfp-fix

edb1d95

udpate test

65eff0c

Merge branch 'main' into exact-cfp-fix

fe62381

fix

a310e7f

comment

c567e37

kripken reviewed Sep 9, 2025

View reviewed changes

Comment about convenience subscripting.

7d0445e

Co-authored-by: Alon Zakai <azakai@google.com>

kripken reviewed Sep 9, 2025

View reviewed changes

tlively added 3 commits September 9, 2025 09:34

lambda

89751b1

comment on exact propagation

49dcfc7

Merge branch 'main' into exact-cfp-fix

80a65e1

kripken reviewed Sep 10, 2025

View reviewed changes

tlively added 5 commits September 10, 2025 15:05

Merge branch 'main' into exact-cfp-fix

56720bc

update comment

6403073

Merge branch 'main' into exact-cfp-fix

04fe19a

Merge branch 'main' into exact-cfp-fix

13b7a0f

remove complexity of avoiding exactness

1d61606

kripken reviewed Sep 12, 2025

View reviewed changes

tlively changed the title ~~Propate to exact types in struct-utils.h~~ Propagate to exact types in struct-utils.h Sep 12, 2025

expand comment

fe63302

kripken reviewed Sep 12, 2025

View reviewed changes

kripken approved these changes Sep 12, 2025

View reviewed changes

tlively merged commit 52899d4 into main Sep 13, 2025
16 checks passed

tlively deleted the exact-cfp-fix branch September 13, 2025 03:01

Propagate to exact types in struct-utils.h #7889

Propagate to exact types in struct-utils.h #7889

Conversation

tlively commented Sep 6, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kripken Sep 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tlively commented Sep 12, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kripken Sep 10, 2025 •

edited

Loading