-
Notifications
You must be signed in to change notification settings - Fork 14.7k
[LLVM][LangRef] Remove "n > 0" restriction from get.active.lanes.mask. #152140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
The specification for get.active.lanes.mask says a limit value of zero results in poison. This seems like an artificial restriction and means you cannot use the intrinsic to create minimal loops of the form: foo(int count, ....) { int i = 0; while (mask = get.active.lane.mask(i, count)) { ; do work i += count_bits(mask); } } I cannot see any code that generates poison in this case, in fact ConstantFoldFixedVectorCall returns the logical result (i.e. an all false vector). There are also cases like `can_overflow_i64_induction_var` in sve-tail-folding-overflow-checks.ll that look broken by the current definition? for the case when "%N <= vscale * 4". For these reasons I'm asking if we can simply remove the restriction?
@llvm/pr-subscribers-llvm-analysis Author: Paul Walker (paulwalker-arm) ChangesThe specification for get.active.lanes.mask says a limit value of zero results in poison. This seems like an artificial restriction and means you cannot use the intrinsic to create minimal loops of the form: foo(int count, ....) { I cannot see any code that generates poison in this case, in fact ConstantFoldFixedVectorCall returns the logical result (i.e. an all false vector). There are also cases like For these reasons I'm asking if we can simply remove the restriction? Full diff: https://github.com/llvm/llvm-project/pull/152140.diff 3 Files Affected:
diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst
index d8cd3b894cda4..77b10b3c22ef4 100644
--- a/llvm/docs/LangRef.rst
+++ b/llvm/docs/LangRef.rst
@@ -24014,8 +24014,7 @@ indexed by ``i``, and ``%base``, ``%n`` are the two arguments to
``llvm.get.active.lane.mask.*``, ``%icmp`` is an integer compare and ``ult``
the unsigned less-than comparison operator. Overflow cannot occur in
``(%base + i)`` and its comparison against ``%n`` as it is performed in integer
-numbers and not in machine numbers. If ``%n`` is ``0``, then the result is a
-poison value. The above is equivalent to:
+numbers and not in machine numbers. The above is equivalent to:
::
diff --git a/llvm/lib/Analysis/Lint.cpp b/llvm/lib/Analysis/Lint.cpp
index 1168005f48c0e..32a4264c0343c 100644
--- a/llvm/lib/Analysis/Lint.cpp
+++ b/llvm/lib/Analysis/Lint.cpp
@@ -374,13 +374,6 @@ void Lint::visitCallBase(CallBase &I) {
visitMemoryReference(I, MemoryLocation::getForArgument(&I, 0, TLI),
std::nullopt, nullptr, MemRef::Read | MemRef::Write);
break;
- case Intrinsic::get_active_lane_mask:
- if (auto *TripCount = dyn_cast<ConstantInt>(I.getArgOperand(1)))
- Check(!TripCount->isZero(),
- "get_active_lane_mask: operand #2 "
- "must be greater than 0",
- &I);
- break;
}
}
diff --git a/llvm/test/Analysis/Lint/get-active-lane-mask.ll b/llvm/test/Analysis/Lint/get-active-lane-mask.ll
deleted file mode 100644
index e9b161846e8bd..0000000000000
--- a/llvm/test/Analysis/Lint/get-active-lane-mask.ll
+++ /dev/null
@@ -1,39 +0,0 @@
-; RUN: opt -passes=lint -disable-output < %s 2>&1 | FileCheck %s
-
-define <4 x i1> @t1(i32 %IV) {
-;
-; CHECK: get_active_lane_mask: operand #2 must be greater than 0
-; CHECK-NEXT: %res = call <4 x i1> @llvm.get.active.lane.mask.v4i1.i32(i32 %IV, i32 0)
-;
- %res = call <4 x i1> @llvm.get.active.lane.mask.v4i1.i32(i32 %IV, i32 0)
- ret <4 x i1> %res
-}
-
-define <4 x i1> @t2(i32 %IV) {
-;
-; CHECK-NOT: get_active_lane_mask
-; CHECK-NOT: call <4 x i1> @llvm.get.active.lane.mask
-;
- %res = call <4 x i1> @llvm.get.active.lane.mask.v4i1.i32(i32 %IV, i32 1)
- ret <4 x i1> %res
-}
-
-define <4 x i1> @t3(i32 %IV) {
-;
-; CHECK-NOT: get_active_lane_mask
-; CHECK-NOT: call <4 x i1> @llvm.get.active.lane.mask
-;
- %res = call <4 x i1> @llvm.get.active.lane.mask.v4i1.i32(i32 %IV, i32 -1)
- ret <4 x i1> %res
-}
-
-define <4 x i1> @t4(i32 %IV, i32 %TC) {
-;
-; CHECK-NOT: get_active_lane_mask
-; CHECK-NOT: call <4 x i1> @llvm.get.active.lane.mask
-;
- %res = call <4 x i1> @llvm.get.active.lane.mask.v4i1.i32(i32 %IV, i32 %TC)
- ret <4 x i1> %res
-}
-
-declare <4 x i1> @llvm.get.active.lane.mask.v4i1.i32(i32, i32)
|
@llvm/pr-subscribers-llvm-ir Author: Paul Walker (paulwalker-arm) ChangesThe specification for get.active.lanes.mask says a limit value of zero results in poison. This seems like an artificial restriction and means you cannot use the intrinsic to create minimal loops of the form: foo(int count, ....) { I cannot see any code that generates poison in this case, in fact ConstantFoldFixedVectorCall returns the logical result (i.e. an all false vector). There are also cases like For these reasons I'm asking if we can simply remove the restriction? Full diff: https://github.com/llvm/llvm-project/pull/152140.diff 3 Files Affected:
diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst
index d8cd3b894cda4..77b10b3c22ef4 100644
--- a/llvm/docs/LangRef.rst
+++ b/llvm/docs/LangRef.rst
@@ -24014,8 +24014,7 @@ indexed by ``i``, and ``%base``, ``%n`` are the two arguments to
``llvm.get.active.lane.mask.*``, ``%icmp`` is an integer compare and ``ult``
the unsigned less-than comparison operator. Overflow cannot occur in
``(%base + i)`` and its comparison against ``%n`` as it is performed in integer
-numbers and not in machine numbers. If ``%n`` is ``0``, then the result is a
-poison value. The above is equivalent to:
+numbers and not in machine numbers. The above is equivalent to:
::
diff --git a/llvm/lib/Analysis/Lint.cpp b/llvm/lib/Analysis/Lint.cpp
index 1168005f48c0e..32a4264c0343c 100644
--- a/llvm/lib/Analysis/Lint.cpp
+++ b/llvm/lib/Analysis/Lint.cpp
@@ -374,13 +374,6 @@ void Lint::visitCallBase(CallBase &I) {
visitMemoryReference(I, MemoryLocation::getForArgument(&I, 0, TLI),
std::nullopt, nullptr, MemRef::Read | MemRef::Write);
break;
- case Intrinsic::get_active_lane_mask:
- if (auto *TripCount = dyn_cast<ConstantInt>(I.getArgOperand(1)))
- Check(!TripCount->isZero(),
- "get_active_lane_mask: operand #2 "
- "must be greater than 0",
- &I);
- break;
}
}
diff --git a/llvm/test/Analysis/Lint/get-active-lane-mask.ll b/llvm/test/Analysis/Lint/get-active-lane-mask.ll
deleted file mode 100644
index e9b161846e8bd..0000000000000
--- a/llvm/test/Analysis/Lint/get-active-lane-mask.ll
+++ /dev/null
@@ -1,39 +0,0 @@
-; RUN: opt -passes=lint -disable-output < %s 2>&1 | FileCheck %s
-
-define <4 x i1> @t1(i32 %IV) {
-;
-; CHECK: get_active_lane_mask: operand #2 must be greater than 0
-; CHECK-NEXT: %res = call <4 x i1> @llvm.get.active.lane.mask.v4i1.i32(i32 %IV, i32 0)
-;
- %res = call <4 x i1> @llvm.get.active.lane.mask.v4i1.i32(i32 %IV, i32 0)
- ret <4 x i1> %res
-}
-
-define <4 x i1> @t2(i32 %IV) {
-;
-; CHECK-NOT: get_active_lane_mask
-; CHECK-NOT: call <4 x i1> @llvm.get.active.lane.mask
-;
- %res = call <4 x i1> @llvm.get.active.lane.mask.v4i1.i32(i32 %IV, i32 1)
- ret <4 x i1> %res
-}
-
-define <4 x i1> @t3(i32 %IV) {
-;
-; CHECK-NOT: get_active_lane_mask
-; CHECK-NOT: call <4 x i1> @llvm.get.active.lane.mask
-;
- %res = call <4 x i1> @llvm.get.active.lane.mask.v4i1.i32(i32 %IV, i32 -1)
- ret <4 x i1> %res
-}
-
-define <4 x i1> @t4(i32 %IV, i32 %TC) {
-;
-; CHECK-NOT: get_active_lane_mask
-; CHECK-NOT: call <4 x i1> @llvm.get.active.lane.mask
-;
- %res = call <4 x i1> @llvm.get.active.lane.mask.v4i1.i32(i32 %IV, i32 %TC)
- ret <4 x i1> %res
-}
-
-declare <4 x i1> @llvm.get.active.lane.mask.v4i1.i32(i32, i32)
|
See https://reviews.llvm.org/D86147 and https://reviews.llvm.org/D79175 for the original discussion of the restriction. CC @sjoerdmeijer @sparker-arm . From what I recall, there's some interaction with MVE tail predication that breaks with n=0. But I haven't thought about this for a long time, so I'm not sure if that's still true. |
Yep, that's right, we rely on this behaviour for MVE and tail-predicated hardware-loops. |
I thought this trauma was behind me!
Yeah, this sounds very familiar. Don't we need to be able to convert this to back into an equivalent |
I looked at If true and not fixable then I guess an alternative route is to add a bool to |
We certainly rely on separate intrinsic to set the trip count but, IIRC, SCEV is used to calculate whether the two intrinsics agree on the expected behaviour. If this guarantee isn't meant, then we rely on SelectionDAG lowering, like everyone else, which happens here. |
This is probably it, I think this is most plausible explanation. |
I meant when analysing the original loop to determine whether the MVE transformation is possible. As in, I'm asking whether |
I'm sorry, I'm not really following... I don't seem to remember non-zeros being a problem, SCEV can normally tell you this with Originally, the intrinsic was defined using the backend-taken-count (BTC), the loop trip count - 1, whereas we actually wanted the loop trip count. But it was very difficult to prove that BTC wasn't MAX, which could cause overflow while performing the +1 to calculate the trip count. Conversely, if my memory is correct, we needed to be able to reconstruct in terms of BTC, so you could wrap around zero performing the -1. Though, it doesn't mean this is still the case. Looking at SelectionDAGBuilder, it's using a UADDSAT so no overflow is going to happen. So, maybe this change is okay. I haven't looked at the final backend pass in years, and I can't remember if there's any assumptions there about non-zero values. @davemgreen is really the only person who will have any form of recent memory about it. |
The specification for get.active.lanes.mask says a limit value of zero results in poison. This seems like an artificial restriction and means you cannot use the intrinsic to create minimal loops of the form:
I cannot see any code that generates poison in this case, in fact ConstantFoldFixedVectorCall returns the logical result (i.e. an all false vector).
There are also cases like
can_overflow_i64_induction_var
in sve-tail-folding-overflow-checks.ll that look broken by the current definition? for the case when "%N <= vscale * 4".For these reasons I'm asking if we can simply remove the restriction?