|
21 | 21 | // 1.1) This pass is based on MLIR's dataflow framework. In hindsight, maybe it
|
22 | 22 | // is ill-fit for what we need.
|
23 | 23 | // 1.2) If I understand correctly, the MLIR's dataflow framework is a
|
24 |
| -// combination |
25 |
| -// of traditional iterative dataflow analysis and Sparse Conditional |
26 |
| -// Constant propagation (SCCP). |
| 24 | +// combination of traditional iterative dataflow analysis and a mighty |
| 25 | +// Sparse Conditional Constant propagation (SCCP). |
27 | 26 | // 1.3) Iterative dataflow analysis requires transfer function to be monotone.
|
28 | 27 | // However, not all value-ranges keep increasing when the analysis progress.
|
29 | 28 | // Consider the expression x - y, while x and y's value-range may keep
|
30 | 29 | // increasing, the difference between them does not necessarily keep
|
31 | 30 | // increasing as well.
|
32 |
| -// 1.4) SCCP part is not necessary for this pass. We don't expect many dead |
33 |
| -// code at |
34 |
| -// the moment this analysis is invoked. The SCCP part only make the anlaysis |
35 |
| -// take longer time to converge, and it make more complicated to workaround |
36 |
| -// the framework's limitations. |
37 |
| -// 1.5) The MLIR dataflow framework does not understand SCF. On top of that it |
38 |
| -// provides little interfaces to customize it. So, we have to rely on hack |
39 |
| -// to sidestep these limitations. |
40 |
| -// 1.6 Maybe just walking the code top-dowm is suffice for range-analysis? |
| 31 | +// 1.4) The 1st C in SCCP, i.e. "conditional" part in SCCP part is unnecessary |
| 32 | +// for this pass, because we don't expect many dead code at the moment when |
| 33 | +// this analysis is invoked. Price for being "conditional" is less about |
| 34 | +// compile time but complexity (in terms of debugging and understanding). |
| 35 | +// 1.5 Maybe just walking the code top-dowm is suffice for range-analysis: |
41 | 36 | // For loops, figuring out IVs' value-ranges before loops are entered, and
|
42 | 37 | // progress to loop-body, without visiting back-edge for non-SCF loops.
|
43 |
| -// 1.7 As with SCCP which maintain two worklists, one for control-flow |
44 |
| -// dependence, one for data-flow dependence. The framework seems to maintain |
45 |
| -// a single unified worklist, with each item being a pair of |
46 |
| -// <particular-analysis, operation-to-be-analyzed>. |
47 | 38 | //
|
48 | 39 | // 2: tl.assume statements
|
49 | 40 | // 2.1) A value may have multiple assume-operations (assume-ops for short)
|
|
53 | 44 | // 2.3) The assumed value-range for source and result operands are inferred
|
54 | 45 | // right
|
55 | 46 | // before an operation is visited.
|
56 |
| -// 2.4) For now, if a value a assumed value-range, we use assumed value-range. |
57 |
| -// We should use the intersection of assumed-value-range and inferred-value- |
58 |
| -// range. However, it is not always possible: iterative dataflow analysis |
| 47 | +// 2.4) For now, if a value has a assumed value-range, we use assumed |
| 48 | +// value-range and ignore its inferred value range. It would be nice to |
| 49 | +// use the intersection of assumed-value-range and inferred-value-range. |
| 50 | +// However, it is not always possible: iterative dataflow analysis |
59 | 51 | // requires that the transfer function must be monotone; in general it's
|
60 | 52 | // dangerous to use both meet() and join() operations. In this pass,
|
61 | 53 | // intersecting inferred value-range with assumed-value-range still guarantee
|
62 | 54 | // its monotonicity. However, the underlying lattice's meet() operation is
|
63 | 55 | // a silent no-op.
|
64 | 56 | //
|
65 |
| -// 3. SCF. |
66 |
| -// 3.1 As mentioned above, MLIR's dataflow framework does not understand SCF. |
67 |
| -// 3.2 For example, yield-op will not be visited by subclass's |
68 |
| -// visitOperation(). |
69 |
| -// That is because the base-class think yield-op has zero result and take |
70 |
| -// for granted it has no value to analyze. |
71 |
| -// 3.3 The built-in SCCP part makes the visit order somewhat complicated. |
72 |
| -// Operations are not visited in forward order. |
73 |
| -// 3.4 This is an example explaining how to SCF is processed, and how we |
74 |
| -// workaround this problem. |
75 |
| -// |
76 |
| -// op0: cond = ... |
77 |
| -// x, y = scf.if cond { |
78 |
| -// // then-block |
79 |
| -// op1: a = ... |
80 |
| -// op2: yield a, b |
81 |
| -// } else { |
82 |
| -// // else-block |
83 |
| -// op3: d = |
84 |
| -// op4: yield c, d |
85 |
| -// } |
86 |
| -// op5: z = add x, y |
87 |
| -// |
88 |
| -// step 1: as mentioned in 1.7, multiple analyses comprise the framework with |
89 |
| -// an unified worklist. DCE kick in first, when it visit the scf.if, the |
90 |
| -// "cond" does not have lattice associated with it. So it initially |
91 |
| -// considered both then-block and else-block are dead. |
92 |
| -// step 2: after DCE going over all items in the worklist, range-analysis gets |
93 |
| -// the chance. op0 is visited, a non-bottom lattice is created for op0's LHS. |
94 |
| -// step 3. The baseclass (belong to framework) visits the scf.if |
95 |
| -// it calls this class's visitRegionSuccessors(). Basically, |
96 |
| -// visitRegionSuccessors() gives subclass a chance to prepare for RHS for |
97 |
| -// SCF operations. This class does nothing for scf.if. |
98 |
| -// step 3: The base-class returns once sub-class's visitRegionSuccessors() |
99 |
| -// returns. Therefor, this class (subclass)'s visitOperand() function is |
100 |
| -// *NOT* called with with scf.if. |
101 |
| -// step 4: The base-class tries to visit the sub-regions (i.e. then- and else- |
102 |
| -// blocks), only finds they are dead (due to step 1) and hence skip them. |
103 |
| -// step 5: after step 4, the lattice of x and y are in "bottom" state. |
104 |
| -// When op5 is visit, range-analysis find one of source operands is in |
105 |
| -// "bottom" state, and do not update z's state. |
106 |
| -// ... |
107 |
| -// next round starts. |
108 |
| -// step 5: DCE found "cond" has non-bottom state associated with it, and mark |
109 |
| -// then- and else-block "live" accordingly. |
110 |
| -// step 6: Range-analysis get a chance to visit the then- and else-block. |
111 |
| -// step 7: when op1 is visited. *HACK KICK IN*. Range-analysis found op1 is |
112 |
| -// used by yield-op, it then in turn updates x's state. |
113 |
| -// step 8: likewise, then op3's visited, y's state is updated as well. |
114 |
| -// step 9: finally, x and y has non-bottom state, when op5 is visited, z's |
115 |
| -// state is updated. |
116 | 57 |
|
117 | 58 | #undef DEBUG_TYPE
|
118 | 59 | #define DEBUG_TYPE "tritonamdgpu-range-analysis"
|
@@ -223,7 +164,7 @@ void inferResultRangesMaxNonNegSigned(Operation *op,
|
223 | 164 | }
|
224 | 165 | }
|
225 | 166 |
|
226 |
| -// Given an assumption operaiton, try to derive the value range of the value |
| 167 | +// Given an assumption operation, try to derive the value range of the value |
227 | 168 | // <anchor>'s value range at the somewhere in the block "useBlock".
|
228 | 169 | // Note that
|
229 | 170 | // - The value "anchor" is defined or referenced in the "useBlock"
|
@@ -683,7 +624,7 @@ void TritonIntegerRangeAnalysis::defaultTransferFunc(
|
683 | 624 | }
|
684 | 625 |
|
685 | 626 | // step 4: Update the value range. Note that we are using `join` operation
|
686 |
| - // which means `union`. Transfer funtion must be monotone! The resolver |
| 627 | + // which means `union`. Transfer function must be monotone! The resolver |
687 | 628 | // would otherwise fall into infinite loop.
|
688 | 629 | ChangeResult changed = lattice->join(incomingRange_);
|
689 | 630 | LLVM_DEBUG({
|
@@ -718,12 +659,12 @@ TritonIntegerRangeAnalysis::rectifyInfferableRange(
|
718 | 659 |
|
719 | 660 | auto isPos = [](const ConstantIntRanges &range) {
|
720 | 661 | // Return true iff in both unsigned and signed representation, the most
|
721 |
| - // siganificant bit is always 0. |
| 662 | + // significant bit is always 0. |
722 | 663 | return range.umax().isNonNegative() && range.smax().isNonNegative() &&
|
723 | 664 | range.smin().isNonNegative();
|
724 | 665 | };
|
725 | 666 |
|
726 |
| - // Not appliable to those bin-ops yielding unsigned int. |
| 667 | + // Not applicable to those bin-ops yielding unsigned int. |
727 | 668 | if (!signedIntValues.count(op->getResult(0)))
|
728 | 669 | return std::nullopt;
|
729 | 670 |
|
@@ -774,42 +715,6 @@ TritonIntegerRangeAnalysis::rectifyInfferableRange(
|
774 | 715 | return ConstantIntRanges::fromUnsigned(resultRange.umin(), umax);
|
775 | 716 | }
|
776 | 717 |
|
777 |
| -void TritonIntegerRangeAnalysis::visitYieldHelper(Operation *op, Value value) { |
778 |
| - auto yieldOp = dyn_cast<scf::YieldOp>(op); |
779 |
| - LDBG("visit yieldOp: " << yieldOp); |
780 |
| - |
781 |
| - dataflow::IntegerValueRangeLattice *srcLattice = getLatticeElement(value); |
782 |
| - |
783 |
| - for (auto iter : llvm::enumerate(yieldOp->getOperands())) { |
784 |
| - if (iter.value() != value) |
785 |
| - continue; |
786 |
| - |
787 |
| - size_t idx = iter.index(); |
788 |
| - Operation *parentOp = yieldOp->getParentOp(); |
789 |
| - |
790 |
| - if (auto ifOp = dyn_cast<scf::IfOp>(parentOp)) { |
791 |
| - // Get the corresponding scf.if result and its lattice |
792 |
| - mlir::OpResult res = parentOp->getResult(idx); |
793 |
| - dataflow::IntegerValueRangeLattice *resLattice = getLatticeElement(res); |
794 |
| - auto changed = resLattice->join(*srcLattice); |
795 |
| - propagateIfChanged(resLattice, changed); |
796 |
| - |
797 |
| - LLVM_DEBUG({ |
798 |
| - OpPrintingFlags flags; |
799 |
| - flags.skipRegions(true); |
800 |
| - DBGS() << ((changed == ChangeResult::Change) |
801 |
| - ? ">yieldOp bring change: " |
802 |
| - : ">yieldOp bring no change:"); |
803 |
| - res.printAsOperand(llvm::dbgs(), flags); |
804 |
| - llvm::dbgs() << ", resulting value-range: " |
805 |
| - << resLattice->getValue().getValue() |
806 |
| - << ", in value-range: " |
807 |
| - << srcLattice->getValue().getValue() << "\n"; |
808 |
| - }); |
809 |
| - } |
810 |
| - } |
811 |
| -} |
812 |
| - |
813 | 718 | LogicalResult TritonIntegerRangeAnalysis::visitOperation(
|
814 | 719 | Operation *op,
|
815 | 720 | ArrayRef<const dataflow::IntegerValueRangeLattice *> operands,
|
@@ -876,20 +781,6 @@ LogicalResult TritonIntegerRangeAnalysis::visitOperation(
|
876 | 781 | propagateIfChanged(lattice, changed);
|
877 | 782 | }
|
878 | 783 |
|
879 |
| - // step 4: The dataflow framework does not understand SCF. It skip yieldOp |
880 |
| - // as it has no result. To workaround this problem, we visit all yieldOp |
881 |
| - // which depends on this operation. |
882 |
| - for (int resIdx = 0, resEnd = op->getNumResults(); resIdx < resEnd; |
883 |
| - ++resIdx) { |
884 |
| - mlir::OpResult res = op->getResult(resIdx); |
885 |
| - |
886 |
| - for (mlir::OpOperand &use : res.getUses()) { |
887 |
| - mlir::Operation *depOp = use.getOwner(); |
888 |
| - if (auto yield = dyn_cast<scf::YieldOp>(depOp)) |
889 |
| - visitYieldHelper(yield, res); |
890 |
| - } |
891 |
| - } |
892 |
| - |
893 | 784 | return visitResult;
|
894 | 785 | }
|
895 | 786 |
|
@@ -1045,6 +936,27 @@ void TritonIntegerRangeAnalysis::visitRegionSuccessors(
|
1045 | 936 | assert(predecessors->allPredecessorsKnown() &&
|
1046 | 937 | "unexpected unresolved region successors");
|
1047 | 938 |
|
| 939 | + // Note: It does not seems to be quite obvious; this loop could update SCF |
| 940 | + // operations' LHS. e.g. If the given "branch" argument is scf.if, and the |
| 941 | + // scf.if construct looks like following: |
| 942 | + // x = scf.if cond |
| 943 | + // m = ... // op_m |
| 944 | + // yield m |
| 945 | + // else |
| 946 | + // n = ... // op_n |
| 947 | + // yield n |
| 948 | + // |
| 949 | + // This loop tries to update lattice(x) = join(lattice(m), lattice(n), |
| 950 | + // proovided lattice(m) and lattice(n) are initialized. |
| 951 | + // |
| 952 | + // Note that the state of lattice(m) and lattice(n) was updated in the |
| 953 | + // "previous" round. In this "round", the scf.if is vsitied right now, and |
| 954 | + // it takes this moment to update its LHS. |
| 955 | + // |
| 956 | + // Alternatively, when we visit, say op_m, we notice its result is used by |
| 957 | + // a yieldOp, get the yieldOp's corresponding receiver, in this case x, and |
| 958 | + // update its state accordingly. |
| 959 | + // |
1048 | 960 | for (Operation *op : predecessors->getKnownPredecessors()) {
|
1049 | 961 | std::optional<OperandRange> operands;
|
1050 | 962 | if (op == branch) {
|
|
0 commit comments