[LLVM][LangRef] Remove "n > 0" restriction from get.active.lanes.mask. #152140

paulwalker-arm · 2025-08-05T13:50:33Z

The specification for get.active.lanes.mask says a limit value of zero results in poison. This seems like an artificial restriction and means you cannot use the intrinsic to create minimal loops of the form:

foo(int count, ....) {
  int i = 0;
  while (mask = get.active.lane.mask(i, count)) {
    ; do work
    i += count_bits(mask);
  }
}

I cannot see any code that generates poison in this case, in fact ConstantFoldFixedVectorCall returns the logical result (i.e. an all false vector).

There are also cases like can_overflow_i64_induction_var in sve-tail-folding-overflow-checks.ll that look broken by the current definition? for the case when "%N <= vscale * 4".

For these reasons I'm asking if we can simply remove the restriction?

The specification for get.active.lanes.mask says a limit value of zero results in poison. This seems like an artificial restriction and means you cannot use the intrinsic to create minimal loops of the form: foo(int count, ....) { int i = 0; while (mask = get.active.lane.mask(i, count)) { ; do work i += count_bits(mask); } } I cannot see any code that generates poison in this case, in fact ConstantFoldFixedVectorCall returns the logical result (i.e. an all false vector). There are also cases like `can_overflow_i64_induction_var` in sve-tail-folding-overflow-checks.ll that look broken by the current definition? for the case when "%N <= vscale * 4". For these reasons I'm asking if we can simply remove the restriction?

llvmbot · 2025-08-05T13:51:04Z

@llvm/pr-subscribers-llvm-analysis

Author: Paul Walker (paulwalker-arm)

Changes

The specification for get.active.lanes.mask says a limit value of zero results in poison. This seems like an artificial restriction and means you cannot use the intrinsic to create minimal loops of the form:

foo(int count, ....) {
int i = 0;
while (mask = get.active.lane.mask(i, count)) {
; do work
i += count_bits(mask);
}
}

I cannot see any code that generates poison in this case, in fact ConstantFoldFixedVectorCall returns the logical result (i.e. an all false vector).

There are also cases like can_overflow_i64_induction_var in sve-tail-folding-overflow-checks.ll that look broken by the current definition? for the case when "%N <= vscale * 4".

For these reasons I'm asking if we can simply remove the restriction?

Full diff: https://github.com/llvm/llvm-project/pull/152140.diff

3 Files Affected:

(modified) llvm/docs/LangRef.rst (+1-2)
(modified) llvm/lib/Analysis/Lint.cpp (-7)
(removed) llvm/test/Analysis/Lint/get-active-lane-mask.ll (-39)

diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst
index d8cd3b894cda4..77b10b3c22ef4 100644
--- a/llvm/docs/LangRef.rst
+++ b/llvm/docs/LangRef.rst
@@ -24014,8 +24014,7 @@ indexed by ``i``,  and ``%base``, ``%n`` are the two arguments to
 ``llvm.get.active.lane.mask.*``, ``%icmp`` is an integer compare and ``ult``
 the unsigned less-than comparison operator.  Overflow cannot occur in
 ``(%base + i)`` and its comparison against ``%n`` as it is performed in integer
-numbers and not in machine numbers.  If ``%n`` is ``0``, then the result is a
-poison value. The above is equivalent to:
+numbers and not in machine numbers.  The above is equivalent to:
 
 ::
 
diff --git a/llvm/lib/Analysis/Lint.cpp b/llvm/lib/Analysis/Lint.cpp
index 1168005f48c0e..32a4264c0343c 100644
--- a/llvm/lib/Analysis/Lint.cpp
+++ b/llvm/lib/Analysis/Lint.cpp
@@ -374,13 +374,6 @@ void Lint::visitCallBase(CallBase &I) {
       visitMemoryReference(I, MemoryLocation::getForArgument(&I, 0, TLI),
                            std::nullopt, nullptr, MemRef::Read | MemRef::Write);
       break;
-    case Intrinsic::get_active_lane_mask:
-      if (auto *TripCount = dyn_cast<ConstantInt>(I.getArgOperand(1)))
-        Check(!TripCount->isZero(),
-              "get_active_lane_mask: operand #2 "
-              "must be greater than 0",
-              &I);
-      break;
     }
 }
 
diff --git a/llvm/test/Analysis/Lint/get-active-lane-mask.ll b/llvm/test/Analysis/Lint/get-active-lane-mask.ll
deleted file mode 100644
index e9b161846e8bd..0000000000000
--- a/llvm/test/Analysis/Lint/get-active-lane-mask.ll
+++ /dev/null
@@ -1,39 +0,0 @@
-; RUN: opt -passes=lint -disable-output < %s 2>&1 | FileCheck %s
-
-define <4 x i1> @t1(i32 %IV) {
-;
-; CHECK:      get_active_lane_mask: operand #2 must be greater than 0
-; CHECK-NEXT: %res = call <4 x i1> @llvm.get.active.lane.mask.v4i1.i32(i32 %IV, i32 0)
-;
-  %res = call <4 x i1> @llvm.get.active.lane.mask.v4i1.i32(i32 %IV, i32 0)
-  ret <4 x i1> %res
-}
-
-define <4 x i1> @t2(i32 %IV) {
-;
-; CHECK-NOT: get_active_lane_mask
-; CHECK-NOT: call <4 x i1> @llvm.get.active.lane.mask
-;
-  %res = call <4 x i1> @llvm.get.active.lane.mask.v4i1.i32(i32 %IV, i32 1)
-  ret <4 x i1> %res
-}
-
-define <4 x i1> @t3(i32 %IV) {
-;
-; CHECK-NOT: get_active_lane_mask
-; CHECK-NOT: call <4 x i1> @llvm.get.active.lane.mask
-;
-  %res = call <4 x i1> @llvm.get.active.lane.mask.v4i1.i32(i32 %IV, i32 -1)
-  ret <4 x i1> %res
-}
-
-define <4 x i1> @t4(i32 %IV, i32 %TC) {
-;
-; CHECK-NOT: get_active_lane_mask
-; CHECK-NOT: call <4 x i1> @llvm.get.active.lane.mask
-;
-  %res = call <4 x i1> @llvm.get.active.lane.mask.v4i1.i32(i32 %IV, i32 %TC)
-  ret <4 x i1> %res
-}
-
-declare <4 x i1> @llvm.get.active.lane.mask.v4i1.i32(i32, i32)

llvmbot · 2025-08-05T13:51:04Z

@llvm/pr-subscribers-llvm-ir

Author: Paul Walker (paulwalker-arm)

Changes

The specification for get.active.lanes.mask says a limit value of zero results in poison. This seems like an artificial restriction and means you cannot use the intrinsic to create minimal loops of the form:

foo(int count, ....) {
int i = 0;
while (mask = get.active.lane.mask(i, count)) {
; do work
i += count_bits(mask);
}
}

I cannot see any code that generates poison in this case, in fact ConstantFoldFixedVectorCall returns the logical result (i.e. an all false vector).

There are also cases like can_overflow_i64_induction_var in sve-tail-folding-overflow-checks.ll that look broken by the current definition? for the case when "%N <= vscale * 4".

For these reasons I'm asking if we can simply remove the restriction?

Full diff: https://github.com/llvm/llvm-project/pull/152140.diff

3 Files Affected:

(modified) llvm/docs/LangRef.rst (+1-2)
(modified) llvm/lib/Analysis/Lint.cpp (-7)
(removed) llvm/test/Analysis/Lint/get-active-lane-mask.ll (-39)

diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst
index d8cd3b894cda4..77b10b3c22ef4 100644
--- a/llvm/docs/LangRef.rst
+++ b/llvm/docs/LangRef.rst
@@ -24014,8 +24014,7 @@ indexed by ``i``,  and ``%base``, ``%n`` are the two arguments to
 ``llvm.get.active.lane.mask.*``, ``%icmp`` is an integer compare and ``ult``
 the unsigned less-than comparison operator.  Overflow cannot occur in
 ``(%base + i)`` and its comparison against ``%n`` as it is performed in integer
-numbers and not in machine numbers.  If ``%n`` is ``0``, then the result is a
-poison value. The above is equivalent to:
+numbers and not in machine numbers.  The above is equivalent to:
 
 ::
 
diff --git a/llvm/lib/Analysis/Lint.cpp b/llvm/lib/Analysis/Lint.cpp
index 1168005f48c0e..32a4264c0343c 100644
--- a/llvm/lib/Analysis/Lint.cpp
+++ b/llvm/lib/Analysis/Lint.cpp
@@ -374,13 +374,6 @@ void Lint::visitCallBase(CallBase &I) {
       visitMemoryReference(I, MemoryLocation::getForArgument(&I, 0, TLI),
                            std::nullopt, nullptr, MemRef::Read | MemRef::Write);
       break;
-    case Intrinsic::get_active_lane_mask:
-      if (auto *TripCount = dyn_cast<ConstantInt>(I.getArgOperand(1)))
-        Check(!TripCount->isZero(),
-              "get_active_lane_mask: operand #2 "
-              "must be greater than 0",
-              &I);
-      break;
     }
 }
 
diff --git a/llvm/test/Analysis/Lint/get-active-lane-mask.ll b/llvm/test/Analysis/Lint/get-active-lane-mask.ll
deleted file mode 100644
index e9b161846e8bd..0000000000000
--- a/llvm/test/Analysis/Lint/get-active-lane-mask.ll
+++ /dev/null
@@ -1,39 +0,0 @@
-; RUN: opt -passes=lint -disable-output < %s 2>&1 | FileCheck %s
-
-define <4 x i1> @t1(i32 %IV) {
-;
-; CHECK:      get_active_lane_mask: operand #2 must be greater than 0
-; CHECK-NEXT: %res = call <4 x i1> @llvm.get.active.lane.mask.v4i1.i32(i32 %IV, i32 0)
-;
-  %res = call <4 x i1> @llvm.get.active.lane.mask.v4i1.i32(i32 %IV, i32 0)
-  ret <4 x i1> %res
-}
-
-define <4 x i1> @t2(i32 %IV) {
-;
-; CHECK-NOT: get_active_lane_mask
-; CHECK-NOT: call <4 x i1> @llvm.get.active.lane.mask
-;
-  %res = call <4 x i1> @llvm.get.active.lane.mask.v4i1.i32(i32 %IV, i32 1)
-  ret <4 x i1> %res
-}
-
-define <4 x i1> @t3(i32 %IV) {
-;
-; CHECK-NOT: get_active_lane_mask
-; CHECK-NOT: call <4 x i1> @llvm.get.active.lane.mask
-;
-  %res = call <4 x i1> @llvm.get.active.lane.mask.v4i1.i32(i32 %IV, i32 -1)
-  ret <4 x i1> %res
-}
-
-define <4 x i1> @t4(i32 %IV, i32 %TC) {
-;
-; CHECK-NOT: get_active_lane_mask
-; CHECK-NOT: call <4 x i1> @llvm.get.active.lane.mask
-;
-  %res = call <4 x i1> @llvm.get.active.lane.mask.v4i1.i32(i32 %IV, i32 %TC)
-  ret <4 x i1> %res
-}
-
-declare <4 x i1> @llvm.get.active.lane.mask.v4i1.i32(i32, i32)

efriedma-quic · 2025-08-05T22:53:27Z

See https://reviews.llvm.org/D86147 and https://reviews.llvm.org/D79175 for the original discussion of the restriction. CC @sjoerdmeijer @sparker-arm .

From what I recall, there's some interaction with MVE tail predication that breaks with n=0. But I haven't thought about this for a long time, so I'm not sure if that's still true.

sjoerdmeijer · 2025-08-06T06:56:50Z

See https://reviews.llvm.org/D86147 and https://reviews.llvm.org/D79175 for the original discussion of the restriction. CC @sjoerdmeijer @sparker-arm .

From what I recall, there's some interaction with MVE tail predication that breaks with n=0. But I haven't thought about this for a long time, so I'm not sure if that's still true.

Yep, that's right, we rely on this behaviour for MVE and tail-predicated hardware-loops.
With n=0, the loop wouldn't be executed, but incorrectly INT_MAX times, or something like that. But it's been a long time since I looked into MVE. I am hoping @davemgreen or @sparker-arm know more details from memory, otherwise we need to look further into this.

sparker-arm · 2025-08-06T09:40:47Z

I thought this trauma was behind me!

get.active.lane.mask maps to the VCTP instruction and, although I'm awful at reading the manual, I'm unable to reason why it couldn't take a zero value. And I'm struggling to parse the lang ref too.

With n=0, the loop wouldn't be executed, but incorrectly INT_MAX times, or something like that.

Yeah, this sounds very familiar. Don't we need to be able to convert this to back into an equivalent icmp ule and we need to perform a minus one during the reconstruction? So, with a zero value input we wrap to UINT_MAX?

paulwalker-arm · 2025-08-06T09:58:49Z

I looked at VCTP and also could not see a reason to restrict get.active.lane.mask given it does the correct thing with a zero input. This is making me think the intrinsic was restricted so that an MVE transformation didn't need to perform the relevant (SCEV? ValueTracking?) analysis. Does this sounds believable?

If true and not fixable then I guess an alternative route is to add a bool to get.active.lane.mask to define the behaviour for the n==0 case, similar to the cttz intrinsics.

sparker-arm · 2025-08-06T10:14:39Z

Does this sounds believable?

We certainly rely on separate intrinsic to set the trip count but, IIRC, SCEV is used to calculate whether the two intrinsics agree on the expected behaviour. If this guarantee isn't meant, then we rely on SelectionDAG lowering, like everyone else, which happens here.

sjoerdmeijer · 2025-08-06T10:31:00Z

Don't we need to be able to convert this to back into an equivalent icmp ule and we need to perform a minus one during the reconstruction? So, with a zero value input we wrap to UINT_MAX?

This is probably it, I think this is most plausible explanation.

paulwalker-arm · 2025-08-06T10:31:36Z

Does this sounds believable?

We certainly rely on separate intrinsic to set the trip count but, IIRC, SCEV is used to calculate whether the two intrinsics agree on the expected behaviour. If this guarantee isn't meant, then we rely on SelectionDAG lowering, like everyone else, which happens here.

I meant when analysing the original loop to determine whether the MVE transformation is possible. As in, I'm asking whether getSCEV(ActiveLaneMask.getOperand(1)).isKnownNonZero() didn't work for the MVE use case so instead the definition of the intrinsic was changed?

sparker-arm · 2025-08-06T12:07:19Z

I'm sorry, I'm not really following... I don't seem to remember non-zeros being a problem, SCEV can normally tell you this with IsLoopEntryGuardedByCond. The problem was at the other end of the value scale.

Originally, the intrinsic was defined using the backend-taken-count (BTC), the loop trip count - 1, whereas we actually wanted the loop trip count. But it was very difficult to prove that BTC wasn't MAX, which could cause overflow while performing the +1 to calculate the trip count. Conversely, if my memory is correct, we needed to be able to reconstruct in terms of BTC, so you could wrap around zero performing the -1.

Though, it doesn't mean this is still the case.

Looking at SelectionDAGBuilder, it's using a UADDSAT so no overflow is going to happen. So, maybe this change is okay.

I haven't looked at the final backend pass in years, and I can't remember if there's any assumptions there about non-zero values. @davemgreen is really the only person who will have any form of recent memory about it.

paulwalker-arm requested review from nikic, fhahn and preames August 5, 2025 13:50

llvmbot added llvm:ir llvm:analysis Includes value tracking, cost tables and constant folding labels Aug 5, 2025

sjoerdmeijer requested a review from davemgreen August 6, 2025 06:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[LLVM][LangRef] Remove "n > 0" restriction from get.active.lanes.mask. #152140

[LLVM][LangRef] Remove "n > 0" restriction from get.active.lanes.mask. #152140

paulwalker-arm commented Aug 5, 2025 •

edited

Loading

Uh oh!

llvmbot commented Aug 5, 2025

Uh oh!

llvmbot commented Aug 5, 2025

Uh oh!

efriedma-quic commented Aug 5, 2025

Uh oh!

sjoerdmeijer commented Aug 6, 2025

Uh oh!

sparker-arm commented Aug 6, 2025

Uh oh!

paulwalker-arm commented Aug 6, 2025

Uh oh!

sparker-arm commented Aug 6, 2025

Uh oh!

sjoerdmeijer commented Aug 6, 2025

Uh oh!

paulwalker-arm commented Aug 6, 2025

Uh oh!

sparker-arm commented Aug 6, 2025

Uh oh!

Uh oh!

[LLVM][LangRef] Remove "n > 0" restriction from get.active.lanes.mask. #152140

Are you sure you want to change the base?

[LLVM][LangRef] Remove "n > 0" restriction from get.active.lanes.mask. #152140

Conversation

paulwalker-arm commented Aug 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Aug 5, 2025

Uh oh!

llvmbot commented Aug 5, 2025

Uh oh!

efriedma-quic commented Aug 5, 2025

Uh oh!

sjoerdmeijer commented Aug 6, 2025

Uh oh!

sparker-arm commented Aug 6, 2025

Uh oh!

paulwalker-arm commented Aug 6, 2025

Uh oh!

sparker-arm commented Aug 6, 2025

Uh oh!

sjoerdmeijer commented Aug 6, 2025

Uh oh!

paulwalker-arm commented Aug 6, 2025

Uh oh!

sparker-arm commented Aug 6, 2025

Uh oh!

Uh oh!

paulwalker-arm commented Aug 5, 2025 •

edited

Loading