remove yeildop hack

yangshuxin · yangshuxin · commit 68eff6493239 · 2025-10-09T12:52:19.000-07:00
diff --git a/third_party/amd/lib/Analysis/RangeAnalysis.cpp b/third_party/amd/lib/Analysis/RangeAnalysis.cpp
@@ -21,29 +21,20 @@
 //  1.1) This pass is based on MLIR's dataflow framework. In hindsight, maybe it
 //    is ill-fit for what we need.
 //  1.2) If I understand correctly, the MLIR's dataflow framework is a
-//  combination
-//     of traditional iterative dataflow analysis and Sparse Conditional
-//     Constant propagation (SCCP).
+//     combination of traditional iterative dataflow analysis and a mighty
+//     Sparse Conditional Constant propagation (SCCP).
 //  1.3) Iterative dataflow analysis requires transfer function to be monotone.
 //    However, not all value-ranges keep increasing when the analysis progress.
 //    Consider the expression x - y, while x and y's value-range may keep
 //    increasing, the difference between them does not necessarily keep
 //    increasing as well.
-//  1.4) SCCP part is not necessary for this pass. We don't expect many dead
-//  code at
-//    the moment this analysis is invoked. The SCCP part only make the anlaysis
-//    take longer time to converge, and it make more complicated to workaround
-//    the framework's limitations.
-//  1.5) The MLIR dataflow framework does not understand SCF. On top of that it
-//    provides little interfaces to customize it. So, we have to rely on hack
-//    to sidestep these limitations.
-//  1.6 Maybe just walking the code top-dowm is suffice for range-analysis?
+//  1.4) The 1st C in SCCP, i.e. "conditional" part in SCCP part is unnecessary
+//    for this pass, because we don't expect many dead code at the moment when
+//    this analysis is invoked. Price for being "conditional" is less about
+//    compile time but complexity (in terms of debugging and understanding).
+//  1.5 Maybe just walking the code top-dowm is suffice for range-analysis:
 //    For loops, figuring out IVs' value-ranges before loops are entered, and
 //    progress to loop-body, without visiting back-edge for non-SCF loops.
-//  1.7 As with SCCP which maintain two worklists, one for control-flow
-//    dependence, one for data-flow dependence. The framework seems to maintain
-//    a single unified worklist, with each item being a pair of
-//    <particular-analysis, operation-to-be-analyzed>.
 //
 // 2: tl.assume statements
 //  2.1) A value may have multiple assume-operations (assume-ops for short)
@@ -53,66 +44,16 @@
 //  2.3) The assumed value-range for source and result operands are inferred
 //  right
 //    before an operation is visited.
-//  2.4) For now, if a value a assumed value-range, we use assumed value-range.
-//    We should use the intersection of assumed-value-range and inferred-value-
-//    range. However, it is not always possible: iterative dataflow analysis
+//  2.4) For now, if a value has a assumed value-range, we use assumed
+//    value-range and ignore its inferred value range. It would be nice to
+//    use the intersection of assumed-value-range and inferred-value-range.
+//    However, it is not always possible: iterative dataflow analysis
 //    requires that the transfer function must be monotone; in general it's
 //    dangerous to use both meet() and join() operations. In this pass,
 //    intersecting inferred value-range with assumed-value-range still guarantee
 //    its monotonicity. However, the underlying lattice's meet() operation is
 //    a silent no-op.
 //
-// 3. SCF.
-//  3.1 As mentioned above, MLIR's dataflow framework does not understand SCF.
-//  3.2 For example, yield-op will not be visited by subclass's
-//  visitOperation().
-//    That is because the base-class think yield-op has zero result and take
-//    for granted it has no value to analyze.
-//  3.3 The built-in SCCP part makes the visit order somewhat complicated.
-//    Operations are not visited in forward order.
-//  3.4 This is an example explaining how to SCF is processed, and how we
-//    workaround this problem.
-//
-//    op0: cond = ...
-//    x, y = scf.if cond {
-//      // then-block
-//      op1: a = ...
-//      op2: yield a, b
-//    } else {
-//      // else-block
-//      op3: d =
-//      op4: yield c, d
-//    }
-//    op5: z = add x, y
-//
-//  step 1: as mentioned in 1.7, multiple analyses comprise the framework with
-//    an unified worklist. DCE kick in first, when it visit the scf.if, the
-//    "cond" does not have lattice associated with it. So it initially
-//    considered both then-block and else-block are dead.
-//  step 2: after DCE going over all items in the worklist, range-analysis gets
-//    the chance. op0 is visited, a non-bottom lattice is created for op0's LHS.
-//  step 3. The baseclass (belong to framework) visits the scf.if
-//    it calls this class's visitRegionSuccessors(). Basically,
-//    visitRegionSuccessors() gives subclass a chance to prepare for RHS for
-//    SCF operations. This class does nothing for scf.if.
-//  step 3: The base-class returns once sub-class's visitRegionSuccessors()
-//    returns. Therefor, this class (subclass)'s visitOperand() function is
-//    *NOT* called with with scf.if.
-//  step 4: The base-class tries to visit the sub-regions (i.e. then- and else-
-//   blocks), only finds they are dead (due to step 1) and hence skip them.
-//  step 5: after step 4, the lattice of x and y are in "bottom" state.
-//   When op5 is visit, range-analysis find one of source operands is in
-//   "bottom" state, and do not update z's state.
-//  ...
-//  next round starts.
-//  step 5: DCE found "cond" has non-bottom state associated with it, and mark
-//    then- and else-block "live" accordingly.
-//  step 6: Range-analysis get a chance to visit the then- and else-block.
-//  step 7: when op1 is visited. *HACK KICK IN*. Range-analysis found op1 is
-//   used by yield-op, it then in turn updates x's state.
-//  step 8: likewise, then op3's visited, y's state is updated as well.
-//  step 9: finally, x and y has non-bottom state, when op5 is visited, z's
-//   state is updated.
 
 #undef DEBUG_TYPE
 #define DEBUG_TYPE "tritonamdgpu-range-analysis"
@@ -223,7 +164,7 @@ void inferResultRangesMaxNonNegSigned(Operation *op,
   }
 }
 
-// Given an assumption operaiton, try to derive the value range of the value
+// Given an assumption operation, try to derive the value range of the value
 // <anchor>'s value range at the somewhere in the block "useBlock".
 // Note that
 //  - The value "anchor" is defined or referenced in the "useBlock"
@@ -683,7 +624,7 @@ void TritonIntegerRangeAnalysis::defaultTransferFunc(
   }
 
   // step 4: Update the value range. Note that we are using `join` operation
-  //  which means `union`. Transfer funtion must be monotone! The resolver
+  //  which means `union`. Transfer function must be monotone! The resolver
   //  would otherwise fall into infinite loop.
   ChangeResult changed = lattice->join(incomingRange_);
   LLVM_DEBUG({
@@ -718,12 +659,12 @@ TritonIntegerRangeAnalysis::rectifyInfferableRange(
 
   auto isPos = [](const ConstantIntRanges &range) {
     // Return true iff in both unsigned and signed representation, the most
-    // siganificant bit is always 0.
+    // significant bit is always 0.
     return range.umax().isNonNegative() && range.smax().isNonNegative() &&
            range.smin().isNonNegative();
   };
 
-  // Not appliable to those bin-ops yielding unsigned int.
+  // Not applicable to those bin-ops yielding unsigned int.
   if (!signedIntValues.count(op->getResult(0)))
     return std::nullopt;
 
@@ -774,42 +715,6 @@ TritonIntegerRangeAnalysis::rectifyInfferableRange(
   return ConstantIntRanges::fromUnsigned(resultRange.umin(), umax);
 }
 
-void TritonIntegerRangeAnalysis::visitYieldHelper(Operation *op, Value value) {
-  auto yieldOp = dyn_cast<scf::YieldOp>(op);
-  LDBG("visit yieldOp: " << yieldOp);
-
-  dataflow::IntegerValueRangeLattice *srcLattice = getLatticeElement(value);
-
-  for (auto iter : llvm::enumerate(yieldOp->getOperands())) {
-    if (iter.value() != value)
-      continue;
-
-    size_t idx = iter.index();
-    Operation *parentOp = yieldOp->getParentOp();
-
-    if (auto ifOp = dyn_cast<scf::IfOp>(parentOp)) {
-      // Get the corresponding scf.if result and its lattice
-      mlir::OpResult res = parentOp->getResult(idx);
-      dataflow::IntegerValueRangeLattice *resLattice = getLatticeElement(res);
-      auto changed = resLattice->join(*srcLattice);
-      propagateIfChanged(resLattice, changed);
-
-      LLVM_DEBUG({
-        OpPrintingFlags flags;
-        flags.skipRegions(true);
-        DBGS() << ((changed == ChangeResult::Change)
-                       ? ">yieldOp bring change: "
-                       : ">yieldOp bring no change:");
-        res.printAsOperand(llvm::dbgs(), flags);
-        llvm::dbgs() << ", resulting value-range: "
-                     << resLattice->getValue().getValue()
-                     << ", in value-range: "
-                     << srcLattice->getValue().getValue() << "\n";
-      });
-    }
-  }
-}
-
 LogicalResult TritonIntegerRangeAnalysis::visitOperation(
     Operation *op,
     ArrayRef<const dataflow::IntegerValueRangeLattice *> operands,
@@ -876,20 +781,6 @@ LogicalResult TritonIntegerRangeAnalysis::visitOperation(
     propagateIfChanged(lattice, changed);
   }
 
-  // step 4: The dataflow framework does not understand SCF. It skip yieldOp
-  // as it has no result. To workaround this problem, we visit all yieldOp
-  // which depends on this operation.
-  for (int resIdx = 0, resEnd = op->getNumResults(); resIdx < resEnd;
-       ++resIdx) {
-    mlir::OpResult res = op->getResult(resIdx);
-
-    for (mlir::OpOperand &use : res.getUses()) {
-      mlir::Operation *depOp = use.getOwner();
-      if (auto yield = dyn_cast<scf::YieldOp>(depOp))
-        visitYieldHelper(yield, res);
-    }
-  }
-
   return visitResult;
 }
 
@@ -1045,6 +936,27 @@ void TritonIntegerRangeAnalysis::visitRegionSuccessors(
   assert(predecessors->allPredecessorsKnown() &&
          "unexpected unresolved region successors");
 
+  // Note: It does not seems to be quite obvious; this loop could update SCF
+  // operations' LHS. e.g. If the given "branch" argument is scf.if, and the
+  // scf.if construct looks like following:
+  //   x = scf.if cond
+  //    m = ... // op_m
+  //    yield m
+  //   else
+  //    n = ... // op_n
+  //    yield n
+  //
+  // This loop tries to update lattice(x) = join(lattice(m), lattice(n),
+  // proovided lattice(m) and lattice(n) are initialized.
+  //
+  // Note that the state of lattice(m) and lattice(n) was updated in the
+  // "previous" round. In this "round", the scf.if is vsitied right now, and
+  // it takes this moment to update its LHS.
+  //
+  // Alternatively, when we visit, say op_m, we notice its result is used by
+  // a yieldOp, get the yieldOp's corresponding receiver, in this case x, and
+  // update its state accordingly.
+  //
   for (Operation *op : predecessors->getKnownPredecessors()) {
     std::optional<OperandRange> operands;
     if (op == branch) {