Skip to content

Commit 68eff64

Browse files
committed
remove yeildop hack
1 parent c418d52 commit 68eff64

File tree

1 file changed

+36
-124
lines changed

1 file changed

+36
-124
lines changed

third_party/amd/lib/Analysis/RangeAnalysis.cpp

Lines changed: 36 additions & 124 deletions
Original file line numberDiff line numberDiff line change
@@ -21,29 +21,20 @@
2121
// 1.1) This pass is based on MLIR's dataflow framework. In hindsight, maybe it
2222
// is ill-fit for what we need.
2323
// 1.2) If I understand correctly, the MLIR's dataflow framework is a
24-
// combination
25-
// of traditional iterative dataflow analysis and Sparse Conditional
26-
// Constant propagation (SCCP).
24+
// combination of traditional iterative dataflow analysis and a mighty
25+
// Sparse Conditional Constant propagation (SCCP).
2726
// 1.3) Iterative dataflow analysis requires transfer function to be monotone.
2827
// However, not all value-ranges keep increasing when the analysis progress.
2928
// Consider the expression x - y, while x and y's value-range may keep
3029
// increasing, the difference between them does not necessarily keep
3130
// increasing as well.
32-
// 1.4) SCCP part is not necessary for this pass. We don't expect many dead
33-
// code at
34-
// the moment this analysis is invoked. The SCCP part only make the anlaysis
35-
// take longer time to converge, and it make more complicated to workaround
36-
// the framework's limitations.
37-
// 1.5) The MLIR dataflow framework does not understand SCF. On top of that it
38-
// provides little interfaces to customize it. So, we have to rely on hack
39-
// to sidestep these limitations.
40-
// 1.6 Maybe just walking the code top-dowm is suffice for range-analysis?
31+
// 1.4) The 1st C in SCCP, i.e. "conditional" part in SCCP part is unnecessary
32+
// for this pass, because we don't expect many dead code at the moment when
33+
// this analysis is invoked. Price for being "conditional" is less about
34+
// compile time but complexity (in terms of debugging and understanding).
35+
// 1.5 Maybe just walking the code top-dowm is suffice for range-analysis:
4136
// For loops, figuring out IVs' value-ranges before loops are entered, and
4237
// progress to loop-body, without visiting back-edge for non-SCF loops.
43-
// 1.7 As with SCCP which maintain two worklists, one for control-flow
44-
// dependence, one for data-flow dependence. The framework seems to maintain
45-
// a single unified worklist, with each item being a pair of
46-
// <particular-analysis, operation-to-be-analyzed>.
4738
//
4839
// 2: tl.assume statements
4940
// 2.1) A value may have multiple assume-operations (assume-ops for short)
@@ -53,66 +44,16 @@
5344
// 2.3) The assumed value-range for source and result operands are inferred
5445
// right
5546
// before an operation is visited.
56-
// 2.4) For now, if a value a assumed value-range, we use assumed value-range.
57-
// We should use the intersection of assumed-value-range and inferred-value-
58-
// range. However, it is not always possible: iterative dataflow analysis
47+
// 2.4) For now, if a value has a assumed value-range, we use assumed
48+
// value-range and ignore its inferred value range. It would be nice to
49+
// use the intersection of assumed-value-range and inferred-value-range.
50+
// However, it is not always possible: iterative dataflow analysis
5951
// requires that the transfer function must be monotone; in general it's
6052
// dangerous to use both meet() and join() operations. In this pass,
6153
// intersecting inferred value-range with assumed-value-range still guarantee
6254
// its monotonicity. However, the underlying lattice's meet() operation is
6355
// a silent no-op.
6456
//
65-
// 3. SCF.
66-
// 3.1 As mentioned above, MLIR's dataflow framework does not understand SCF.
67-
// 3.2 For example, yield-op will not be visited by subclass's
68-
// visitOperation().
69-
// That is because the base-class think yield-op has zero result and take
70-
// for granted it has no value to analyze.
71-
// 3.3 The built-in SCCP part makes the visit order somewhat complicated.
72-
// Operations are not visited in forward order.
73-
// 3.4 This is an example explaining how to SCF is processed, and how we
74-
// workaround this problem.
75-
//
76-
// op0: cond = ...
77-
// x, y = scf.if cond {
78-
// // then-block
79-
// op1: a = ...
80-
// op2: yield a, b
81-
// } else {
82-
// // else-block
83-
// op3: d =
84-
// op4: yield c, d
85-
// }
86-
// op5: z = add x, y
87-
//
88-
// step 1: as mentioned in 1.7, multiple analyses comprise the framework with
89-
// an unified worklist. DCE kick in first, when it visit the scf.if, the
90-
// "cond" does not have lattice associated with it. So it initially
91-
// considered both then-block and else-block are dead.
92-
// step 2: after DCE going over all items in the worklist, range-analysis gets
93-
// the chance. op0 is visited, a non-bottom lattice is created for op0's LHS.
94-
// step 3. The baseclass (belong to framework) visits the scf.if
95-
// it calls this class's visitRegionSuccessors(). Basically,
96-
// visitRegionSuccessors() gives subclass a chance to prepare for RHS for
97-
// SCF operations. This class does nothing for scf.if.
98-
// step 3: The base-class returns once sub-class's visitRegionSuccessors()
99-
// returns. Therefor, this class (subclass)'s visitOperand() function is
100-
// *NOT* called with with scf.if.
101-
// step 4: The base-class tries to visit the sub-regions (i.e. then- and else-
102-
// blocks), only finds they are dead (due to step 1) and hence skip them.
103-
// step 5: after step 4, the lattice of x and y are in "bottom" state.
104-
// When op5 is visit, range-analysis find one of source operands is in
105-
// "bottom" state, and do not update z's state.
106-
// ...
107-
// next round starts.
108-
// step 5: DCE found "cond" has non-bottom state associated with it, and mark
109-
// then- and else-block "live" accordingly.
110-
// step 6: Range-analysis get a chance to visit the then- and else-block.
111-
// step 7: when op1 is visited. *HACK KICK IN*. Range-analysis found op1 is
112-
// used by yield-op, it then in turn updates x's state.
113-
// step 8: likewise, then op3's visited, y's state is updated as well.
114-
// step 9: finally, x and y has non-bottom state, when op5 is visited, z's
115-
// state is updated.
11657

11758
#undef DEBUG_TYPE
11859
#define DEBUG_TYPE "tritonamdgpu-range-analysis"
@@ -223,7 +164,7 @@ void inferResultRangesMaxNonNegSigned(Operation *op,
223164
}
224165
}
225166

226-
// Given an assumption operaiton, try to derive the value range of the value
167+
// Given an assumption operation, try to derive the value range of the value
227168
// <anchor>'s value range at the somewhere in the block "useBlock".
228169
// Note that
229170
// - The value "anchor" is defined or referenced in the "useBlock"
@@ -683,7 +624,7 @@ void TritonIntegerRangeAnalysis::defaultTransferFunc(
683624
}
684625

685626
// step 4: Update the value range. Note that we are using `join` operation
686-
// which means `union`. Transfer funtion must be monotone! The resolver
627+
// which means `union`. Transfer function must be monotone! The resolver
687628
// would otherwise fall into infinite loop.
688629
ChangeResult changed = lattice->join(incomingRange_);
689630
LLVM_DEBUG({
@@ -718,12 +659,12 @@ TritonIntegerRangeAnalysis::rectifyInfferableRange(
718659

719660
auto isPos = [](const ConstantIntRanges &range) {
720661
// Return true iff in both unsigned and signed representation, the most
721-
// siganificant bit is always 0.
662+
// significant bit is always 0.
722663
return range.umax().isNonNegative() && range.smax().isNonNegative() &&
723664
range.smin().isNonNegative();
724665
};
725666

726-
// Not appliable to those bin-ops yielding unsigned int.
667+
// Not applicable to those bin-ops yielding unsigned int.
727668
if (!signedIntValues.count(op->getResult(0)))
728669
return std::nullopt;
729670

@@ -774,42 +715,6 @@ TritonIntegerRangeAnalysis::rectifyInfferableRange(
774715
return ConstantIntRanges::fromUnsigned(resultRange.umin(), umax);
775716
}
776717

777-
void TritonIntegerRangeAnalysis::visitYieldHelper(Operation *op, Value value) {
778-
auto yieldOp = dyn_cast<scf::YieldOp>(op);
779-
LDBG("visit yieldOp: " << yieldOp);
780-
781-
dataflow::IntegerValueRangeLattice *srcLattice = getLatticeElement(value);
782-
783-
for (auto iter : llvm::enumerate(yieldOp->getOperands())) {
784-
if (iter.value() != value)
785-
continue;
786-
787-
size_t idx = iter.index();
788-
Operation *parentOp = yieldOp->getParentOp();
789-
790-
if (auto ifOp = dyn_cast<scf::IfOp>(parentOp)) {
791-
// Get the corresponding scf.if result and its lattice
792-
mlir::OpResult res = parentOp->getResult(idx);
793-
dataflow::IntegerValueRangeLattice *resLattice = getLatticeElement(res);
794-
auto changed = resLattice->join(*srcLattice);
795-
propagateIfChanged(resLattice, changed);
796-
797-
LLVM_DEBUG({
798-
OpPrintingFlags flags;
799-
flags.skipRegions(true);
800-
DBGS() << ((changed == ChangeResult::Change)
801-
? ">yieldOp bring change: "
802-
: ">yieldOp bring no change:");
803-
res.printAsOperand(llvm::dbgs(), flags);
804-
llvm::dbgs() << ", resulting value-range: "
805-
<< resLattice->getValue().getValue()
806-
<< ", in value-range: "
807-
<< srcLattice->getValue().getValue() << "\n";
808-
});
809-
}
810-
}
811-
}
812-
813718
LogicalResult TritonIntegerRangeAnalysis::visitOperation(
814719
Operation *op,
815720
ArrayRef<const dataflow::IntegerValueRangeLattice *> operands,
@@ -876,20 +781,6 @@ LogicalResult TritonIntegerRangeAnalysis::visitOperation(
876781
propagateIfChanged(lattice, changed);
877782
}
878783

879-
// step 4: The dataflow framework does not understand SCF. It skip yieldOp
880-
// as it has no result. To workaround this problem, we visit all yieldOp
881-
// which depends on this operation.
882-
for (int resIdx = 0, resEnd = op->getNumResults(); resIdx < resEnd;
883-
++resIdx) {
884-
mlir::OpResult res = op->getResult(resIdx);
885-
886-
for (mlir::OpOperand &use : res.getUses()) {
887-
mlir::Operation *depOp = use.getOwner();
888-
if (auto yield = dyn_cast<scf::YieldOp>(depOp))
889-
visitYieldHelper(yield, res);
890-
}
891-
}
892-
893784
return visitResult;
894785
}
895786

@@ -1045,6 +936,27 @@ void TritonIntegerRangeAnalysis::visitRegionSuccessors(
1045936
assert(predecessors->allPredecessorsKnown() &&
1046937
"unexpected unresolved region successors");
1047938

939+
// Note: It does not seems to be quite obvious; this loop could update SCF
940+
// operations' LHS. e.g. If the given "branch" argument is scf.if, and the
941+
// scf.if construct looks like following:
942+
// x = scf.if cond
943+
// m = ... // op_m
944+
// yield m
945+
// else
946+
// n = ... // op_n
947+
// yield n
948+
//
949+
// This loop tries to update lattice(x) = join(lattice(m), lattice(n),
950+
// proovided lattice(m) and lattice(n) are initialized.
951+
//
952+
// Note that the state of lattice(m) and lattice(n) was updated in the
953+
// "previous" round. In this "round", the scf.if is vsitied right now, and
954+
// it takes this moment to update its LHS.
955+
//
956+
// Alternatively, when we visit, say op_m, we notice its result is used by
957+
// a yieldOp, get the yieldOp's corresponding receiver, in this case x, and
958+
// update its state accordingly.
959+
//
1048960
for (Operation *op : predecessors->getKnownPredecessors()) {
1049961
std::optional<OperandRange> operands;
1050962
if (op == branch) {

0 commit comments

Comments
 (0)