Skip to content

[LLVM][LangRef] Remove "n > 0" restriction from get.active.lanes.mask. #152140

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

paulwalker-arm
Copy link
Collaborator

@paulwalker-arm paulwalker-arm commented Aug 5, 2025

The specification for get.active.lanes.mask says a limit value of zero results in poison. This seems like an artificial restriction and means you cannot use the intrinsic to create minimal loops of the form:

foo(int count, ....) {
  int i = 0;
  while (mask = get.active.lane.mask(i, count)) {
    ; do work
    i += count_bits(mask);
  }
}

I cannot see any code that generates poison in this case, in fact ConstantFoldFixedVectorCall returns the logical result (i.e. an all false vector).

There are also cases like can_overflow_i64_induction_var in sve-tail-folding-overflow-checks.ll that look broken by the current definition? for the case when "%N <= vscale * 4".

For these reasons I'm asking if we can simply remove the restriction?

The specification for get.active.lanes.mask says a limit value of
zero results in poison. This seems like an artificial restriction and
means you cannot use the intrinsic to create minimal loops of the form:

  foo(int count, ....) {
    int i = 0;
    while (mask = get.active.lane.mask(i, count)) {
      ; do work
      i += count_bits(mask);
    }
  }

I cannot see any code that generates poison in this case, in fact
ConstantFoldFixedVectorCall returns the logical result (i.e. an all
false vector).

There are also cases like `can_overflow_i64_induction_var` in
sve-tail-folding-overflow-checks.ll that look broken by the current
definition? for the case when "%N <= vscale * 4".

For these reasons I'm asking if we can simply remove the restriction?
@llvmbot llvmbot added llvm:ir llvm:analysis Includes value tracking, cost tables and constant folding labels Aug 5, 2025
@llvmbot
Copy link
Member

llvmbot commented Aug 5, 2025

@llvm/pr-subscribers-llvm-analysis

Author: Paul Walker (paulwalker-arm)

Changes

The specification for get.active.lanes.mask says a limit value of zero results in poison. This seems like an artificial restriction and means you cannot use the intrinsic to create minimal loops of the form:

foo(int count, ....) {
int i = 0;
while (mask = get.active.lane.mask(i, count)) {
; do work
i += count_bits(mask);
}
}

I cannot see any code that generates poison in this case, in fact ConstantFoldFixedVectorCall returns the logical result (i.e. an all false vector).

There are also cases like can_overflow_i64_induction_var in sve-tail-folding-overflow-checks.ll that look broken by the current definition? for the case when "%N <= vscale * 4".

For these reasons I'm asking if we can simply remove the restriction?


Full diff: https://github.com/llvm/llvm-project/pull/152140.diff

3 Files Affected:

  • (modified) llvm/docs/LangRef.rst (+1-2)
  • (modified) llvm/lib/Analysis/Lint.cpp (-7)
  • (removed) llvm/test/Analysis/Lint/get-active-lane-mask.ll (-39)
diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst
index d8cd3b894cda4..77b10b3c22ef4 100644
--- a/llvm/docs/LangRef.rst
+++ b/llvm/docs/LangRef.rst
@@ -24014,8 +24014,7 @@ indexed by ``i``,  and ``%base``, ``%n`` are the two arguments to
 ``llvm.get.active.lane.mask.*``, ``%icmp`` is an integer compare and ``ult``
 the unsigned less-than comparison operator.  Overflow cannot occur in
 ``(%base + i)`` and its comparison against ``%n`` as it is performed in integer
-numbers and not in machine numbers.  If ``%n`` is ``0``, then the result is a
-poison value. The above is equivalent to:
+numbers and not in machine numbers.  The above is equivalent to:
 
 ::
 
diff --git a/llvm/lib/Analysis/Lint.cpp b/llvm/lib/Analysis/Lint.cpp
index 1168005f48c0e..32a4264c0343c 100644
--- a/llvm/lib/Analysis/Lint.cpp
+++ b/llvm/lib/Analysis/Lint.cpp
@@ -374,13 +374,6 @@ void Lint::visitCallBase(CallBase &I) {
       visitMemoryReference(I, MemoryLocation::getForArgument(&I, 0, TLI),
                            std::nullopt, nullptr, MemRef::Read | MemRef::Write);
       break;
-    case Intrinsic::get_active_lane_mask:
-      if (auto *TripCount = dyn_cast<ConstantInt>(I.getArgOperand(1)))
-        Check(!TripCount->isZero(),
-              "get_active_lane_mask: operand #2 "
-              "must be greater than 0",
-              &I);
-      break;
     }
 }
 
diff --git a/llvm/test/Analysis/Lint/get-active-lane-mask.ll b/llvm/test/Analysis/Lint/get-active-lane-mask.ll
deleted file mode 100644
index e9b161846e8bd..0000000000000
--- a/llvm/test/Analysis/Lint/get-active-lane-mask.ll
+++ /dev/null
@@ -1,39 +0,0 @@
-; RUN: opt -passes=lint -disable-output < %s 2>&1 | FileCheck %s
-
-define <4 x i1> @t1(i32 %IV) {
-;
-; CHECK:      get_active_lane_mask: operand #2 must be greater than 0
-; CHECK-NEXT: %res = call <4 x i1> @llvm.get.active.lane.mask.v4i1.i32(i32 %IV, i32 0)
-;
-  %res = call <4 x i1> @llvm.get.active.lane.mask.v4i1.i32(i32 %IV, i32 0)
-  ret <4 x i1> %res
-}
-
-define <4 x i1> @t2(i32 %IV) {
-;
-; CHECK-NOT: get_active_lane_mask
-; CHECK-NOT: call <4 x i1> @llvm.get.active.lane.mask
-;
-  %res = call <4 x i1> @llvm.get.active.lane.mask.v4i1.i32(i32 %IV, i32 1)
-  ret <4 x i1> %res
-}
-
-define <4 x i1> @t3(i32 %IV) {
-;
-; CHECK-NOT: get_active_lane_mask
-; CHECK-NOT: call <4 x i1> @llvm.get.active.lane.mask
-;
-  %res = call <4 x i1> @llvm.get.active.lane.mask.v4i1.i32(i32 %IV, i32 -1)
-  ret <4 x i1> %res
-}
-
-define <4 x i1> @t4(i32 %IV, i32 %TC) {
-;
-; CHECK-NOT: get_active_lane_mask
-; CHECK-NOT: call <4 x i1> @llvm.get.active.lane.mask
-;
-  %res = call <4 x i1> @llvm.get.active.lane.mask.v4i1.i32(i32 %IV, i32 %TC)
-  ret <4 x i1> %res
-}
-
-declare <4 x i1> @llvm.get.active.lane.mask.v4i1.i32(i32, i32)

@llvmbot
Copy link
Member

llvmbot commented Aug 5, 2025

@llvm/pr-subscribers-llvm-ir

Author: Paul Walker (paulwalker-arm)

Changes

The specification for get.active.lanes.mask says a limit value of zero results in poison. This seems like an artificial restriction and means you cannot use the intrinsic to create minimal loops of the form:

foo(int count, ....) {
int i = 0;
while (mask = get.active.lane.mask(i, count)) {
; do work
i += count_bits(mask);
}
}

I cannot see any code that generates poison in this case, in fact ConstantFoldFixedVectorCall returns the logical result (i.e. an all false vector).

There are also cases like can_overflow_i64_induction_var in sve-tail-folding-overflow-checks.ll that look broken by the current definition? for the case when "%N <= vscale * 4".

For these reasons I'm asking if we can simply remove the restriction?


Full diff: https://github.com/llvm/llvm-project/pull/152140.diff

3 Files Affected:

  • (modified) llvm/docs/LangRef.rst (+1-2)
  • (modified) llvm/lib/Analysis/Lint.cpp (-7)
  • (removed) llvm/test/Analysis/Lint/get-active-lane-mask.ll (-39)
diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst
index d8cd3b894cda4..77b10b3c22ef4 100644
--- a/llvm/docs/LangRef.rst
+++ b/llvm/docs/LangRef.rst
@@ -24014,8 +24014,7 @@ indexed by ``i``,  and ``%base``, ``%n`` are the two arguments to
 ``llvm.get.active.lane.mask.*``, ``%icmp`` is an integer compare and ``ult``
 the unsigned less-than comparison operator.  Overflow cannot occur in
 ``(%base + i)`` and its comparison against ``%n`` as it is performed in integer
-numbers and not in machine numbers.  If ``%n`` is ``0``, then the result is a
-poison value. The above is equivalent to:
+numbers and not in machine numbers.  The above is equivalent to:
 
 ::
 
diff --git a/llvm/lib/Analysis/Lint.cpp b/llvm/lib/Analysis/Lint.cpp
index 1168005f48c0e..32a4264c0343c 100644
--- a/llvm/lib/Analysis/Lint.cpp
+++ b/llvm/lib/Analysis/Lint.cpp
@@ -374,13 +374,6 @@ void Lint::visitCallBase(CallBase &I) {
       visitMemoryReference(I, MemoryLocation::getForArgument(&I, 0, TLI),
                            std::nullopt, nullptr, MemRef::Read | MemRef::Write);
       break;
-    case Intrinsic::get_active_lane_mask:
-      if (auto *TripCount = dyn_cast<ConstantInt>(I.getArgOperand(1)))
-        Check(!TripCount->isZero(),
-              "get_active_lane_mask: operand #2 "
-              "must be greater than 0",
-              &I);
-      break;
     }
 }
 
diff --git a/llvm/test/Analysis/Lint/get-active-lane-mask.ll b/llvm/test/Analysis/Lint/get-active-lane-mask.ll
deleted file mode 100644
index e9b161846e8bd..0000000000000
--- a/llvm/test/Analysis/Lint/get-active-lane-mask.ll
+++ /dev/null
@@ -1,39 +0,0 @@
-; RUN: opt -passes=lint -disable-output < %s 2>&1 | FileCheck %s
-
-define <4 x i1> @t1(i32 %IV) {
-;
-; CHECK:      get_active_lane_mask: operand #2 must be greater than 0
-; CHECK-NEXT: %res = call <4 x i1> @llvm.get.active.lane.mask.v4i1.i32(i32 %IV, i32 0)
-;
-  %res = call <4 x i1> @llvm.get.active.lane.mask.v4i1.i32(i32 %IV, i32 0)
-  ret <4 x i1> %res
-}
-
-define <4 x i1> @t2(i32 %IV) {
-;
-; CHECK-NOT: get_active_lane_mask
-; CHECK-NOT: call <4 x i1> @llvm.get.active.lane.mask
-;
-  %res = call <4 x i1> @llvm.get.active.lane.mask.v4i1.i32(i32 %IV, i32 1)
-  ret <4 x i1> %res
-}
-
-define <4 x i1> @t3(i32 %IV) {
-;
-; CHECK-NOT: get_active_lane_mask
-; CHECK-NOT: call <4 x i1> @llvm.get.active.lane.mask
-;
-  %res = call <4 x i1> @llvm.get.active.lane.mask.v4i1.i32(i32 %IV, i32 -1)
-  ret <4 x i1> %res
-}
-
-define <4 x i1> @t4(i32 %IV, i32 %TC) {
-;
-; CHECK-NOT: get_active_lane_mask
-; CHECK-NOT: call <4 x i1> @llvm.get.active.lane.mask
-;
-  %res = call <4 x i1> @llvm.get.active.lane.mask.v4i1.i32(i32 %IV, i32 %TC)
-  ret <4 x i1> %res
-}
-
-declare <4 x i1> @llvm.get.active.lane.mask.v4i1.i32(i32, i32)

@efriedma-quic
Copy link
Collaborator

See https://reviews.llvm.org/D86147 and https://reviews.llvm.org/D79175 for the original discussion of the restriction. CC @sjoerdmeijer @sparker-arm .

From what I recall, there's some interaction with MVE tail predication that breaks with n=0. But I haven't thought about this for a long time, so I'm not sure if that's still true.

@sjoerdmeijer sjoerdmeijer requested a review from davemgreen August 6, 2025 06:41
@sjoerdmeijer
Copy link
Collaborator

See https://reviews.llvm.org/D86147 and https://reviews.llvm.org/D79175 for the original discussion of the restriction. CC @sjoerdmeijer @sparker-arm .

From what I recall, there's some interaction with MVE tail predication that breaks with n=0. But I haven't thought about this for a long time, so I'm not sure if that's still true.

Yep, that's right, we rely on this behaviour for MVE and tail-predicated hardware-loops.
With n=0, the loop wouldn't be executed, but incorrectly INT_MAX times, or something like that. But it's been a long time since I looked into MVE. I am hoping @davemgreen or @sparker-arm know more details from memory, otherwise we need to look further into this.

@sparker-arm
Copy link
Contributor

I thought this trauma was behind me!

get.active.lane.mask maps to the VCTP instruction and, although I'm awful at reading the manual, I'm unable to reason why it couldn't take a zero value. And I'm struggling to parse the lang ref too.

With n=0, the loop wouldn't be executed, but incorrectly INT_MAX times, or something like that.

Yeah, this sounds very familiar. Don't we need to be able to convert this to back into an equivalent icmp ule and we need to perform a minus one during the reconstruction? So, with a zero value input we wrap to UINT_MAX?

@paulwalker-arm
Copy link
Collaborator Author

I looked at VCTP and also could not see a reason to restrict get.active.lane.mask given it does the correct thing with a zero input. This is making me think the intrinsic was restricted so that an MVE transformation didn't need to perform the relevant (SCEV? ValueTracking?) analysis. Does this sounds believable?

If true and not fixable then I guess an alternative route is to add a bool to get.active.lane.mask to define the behaviour for the n==0 case, similar to the cttz intrinsics.

@sparker-arm
Copy link
Contributor

Does this sounds believable?

We certainly rely on separate intrinsic to set the trip count but, IIRC, SCEV is used to calculate whether the two intrinsics agree on the expected behaviour. If this guarantee isn't meant, then we rely on SelectionDAG lowering, like everyone else, which happens here.

@sjoerdmeijer
Copy link
Collaborator

Don't we need to be able to convert this to back into an equivalent icmp ule and we need to perform a minus one during the reconstruction? So, with a zero value input we wrap to UINT_MAX?

This is probably it, I think this is most plausible explanation.

@paulwalker-arm
Copy link
Collaborator Author

Does this sounds believable?

We certainly rely on separate intrinsic to set the trip count but, IIRC, SCEV is used to calculate whether the two intrinsics agree on the expected behaviour. If this guarantee isn't meant, then we rely on SelectionDAG lowering, like everyone else, which happens here.

I meant when analysing the original loop to determine whether the MVE transformation is possible. As in, I'm asking whether getSCEV(ActiveLaneMask.getOperand(1)).isKnownNonZero() didn't work for the MVE use case so instead the definition of the intrinsic was changed?

@sparker-arm
Copy link
Contributor

I'm sorry, I'm not really following... I don't seem to remember non-zeros being a problem, SCEV can normally tell you this with IsLoopEntryGuardedByCond. The problem was at the other end of the value scale.

Originally, the intrinsic was defined using the backend-taken-count (BTC), the loop trip count - 1, whereas we actually wanted the loop trip count. But it was very difficult to prove that BTC wasn't MAX, which could cause overflow while performing the +1 to calculate the trip count. Conversely, if my memory is correct, we needed to be able to reconstruct in terms of BTC, so you could wrap around zero performing the -1.

Though, it doesn't mean this is still the case.

Looking at SelectionDAGBuilder, it's using a UADDSAT so no overflow is going to happen. So, maybe this change is okay.

I haven't looked at the final backend pass in years, and I can't remember if there's any assumptions there about non-zero values. @davemgreen is really the only person who will have any form of recent memory about it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
llvm:analysis Includes value tracking, cost tables and constant folding llvm:ir
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants