-
Notifications
You must be signed in to change notification settings - Fork 15.5k
[LV] Avoid SCEVChecks when IV update doesn't overflow #171605
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
@llvm/pr-subscribers-backend-powerpc @llvm/pr-subscribers-vectorizers Author: Ramkumar Ramachandra (artagnon) ChangesWhen using an active lane mask, all SCEVChecks except those for stride-versioning are redundant. Patch is 35.71 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/171605.diff 5 Files Affected:
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index 15d0fa41bd902..96a567307063b 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -1819,7 +1819,8 @@ class GeneratedRTChecks {
/// there is no vector code generation, the check blocks are removed
/// completely.
void create(Loop *L, const LoopAccessInfo &LAI,
- const SCEVPredicate &UnionPred, ElementCount VF, unsigned IC) {
+ const SCEVPredicate &UnionPred, ElementCount VF, unsigned IC,
+ bool UseActiveLaneMask) {
// Hard cutoff to limit compile-time increase in case a very large number of
// runtime checks needs to be generated.
@@ -1837,7 +1838,8 @@ class GeneratedRTChecks {
// ensure the blocks are properly added to LoopInfo & DominatorTree. Those
// may be used by SCEVExpander. The blocks will be un-linked from their
// predecessors and removed from LI & DT at the end of the function.
- if (!UnionPred.isAlwaysTrue()) {
+ if (!UnionPred.isAlwaysTrue() &&
+ (!UseActiveLaneMask || !LAI.getSymbolicStrides().empty())) {
SCEVCheckBlock = SplitBlock(Preheader, Preheader->getTerminator(), DT, LI,
nullptr, "vector.scevcheck");
@@ -10094,7 +10096,8 @@ bool LoopVectorizePass::processLoop(Loop *L) {
// Optimistically generate runtime checks if they are needed. Drop them if
// they turn out to not be profitable.
if (VF.Width.isVector() || SelectedIC > 1) {
- Checks.create(L, *LVL.getLAI(), PSE.getPredicate(), VF.Width, SelectedIC);
+ Checks.create(L, *LVL.getLAI(), PSE.getPredicate(), VF.Width, SelectedIC,
+ useActiveLaneMask(CM.getTailFoldingStyle()));
// Bail out early if either the SCEV or memory runtime checks are known to
// fail. In that case, the vector loop would never execute.
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/conditional-branches-cost.ll b/llvm/test/Transforms/LoopVectorize/AArch64/conditional-branches-cost.ll
index b549a06f08f8c..d695dc51907f5 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/conditional-branches-cost.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/conditional-branches-cost.ll
@@ -840,31 +840,7 @@ define void @test_conditional_interleave_group (ptr noalias %src.1, ptr noalias
; PRED-SAME: ptr noalias [[SRC_1:%.*]], ptr noalias [[SRC_2:%.*]], ptr noalias [[SRC_3:%.*]], ptr noalias [[SRC_4:%.*]], ptr noalias [[DST:%.*]], i64 [[N:%.*]]) #[[ATTR3:[0-9]+]] {
; PRED-NEXT: [[ENTRY:.*:]]
; PRED-NEXT: [[TMP0:%.*]] = add i64 [[N]], 1
-; PRED-NEXT: br label %[[VECTOR_SCEVCHECK:.*]]
-; PRED: [[VECTOR_SCEVCHECK]]:
-; PRED-NEXT: [[MUL:%.*]] = call { i64, i1 } @llvm.umul.with.overflow.i64(i64 16, i64 [[N]])
-; PRED-NEXT: [[MUL_RESULT:%.*]] = extractvalue { i64, i1 } [[MUL]], 0
-; PRED-NEXT: [[MUL_OVERFLOW:%.*]] = extractvalue { i64, i1 } [[MUL]], 1
-; PRED-NEXT: [[TMP2:%.*]] = getelementptr i8, ptr [[DST]], i64 [[MUL_RESULT]]
-; PRED-NEXT: [[TMP3:%.*]] = icmp ult ptr [[TMP2]], [[DST]]
-; PRED-NEXT: [[TMP4:%.*]] = or i1 [[TMP3]], [[MUL_OVERFLOW]]
-; PRED-NEXT: [[SCEVGEP:%.*]] = getelementptr i8, ptr [[DST]], i64 4
-; PRED-NEXT: [[MUL1:%.*]] = call { i64, i1 } @llvm.umul.with.overflow.i64(i64 16, i64 [[N]])
-; PRED-NEXT: [[MUL_RESULT2:%.*]] = extractvalue { i64, i1 } [[MUL1]], 0
-; PRED-NEXT: [[MUL_OVERFLOW3:%.*]] = extractvalue { i64, i1 } [[MUL1]], 1
-; PRED-NEXT: [[TMP6:%.*]] = getelementptr i8, ptr [[SCEVGEP]], i64 [[MUL_RESULT2]]
-; PRED-NEXT: [[TMP7:%.*]] = icmp ult ptr [[TMP6]], [[SCEVGEP]]
-; PRED-NEXT: [[TMP8:%.*]] = or i1 [[TMP7]], [[MUL_OVERFLOW3]]
-; PRED-NEXT: [[SCEVGEP4:%.*]] = getelementptr i8, ptr [[DST]], i64 8
-; PRED-NEXT: [[MUL5:%.*]] = call { i64, i1 } @llvm.umul.with.overflow.i64(i64 16, i64 [[N]])
-; PRED-NEXT: [[MUL_RESULT6:%.*]] = extractvalue { i64, i1 } [[MUL5]], 0
-; PRED-NEXT: [[MUL_OVERFLOW7:%.*]] = extractvalue { i64, i1 } [[MUL5]], 1
-; PRED-NEXT: [[TMP10:%.*]] = getelementptr i8, ptr [[SCEVGEP4]], i64 [[MUL_RESULT6]]
-; PRED-NEXT: [[TMP11:%.*]] = icmp ult ptr [[TMP10]], [[SCEVGEP4]]
-; PRED-NEXT: [[TMP12:%.*]] = or i1 [[TMP11]], [[MUL_OVERFLOW7]]
-; PRED-NEXT: [[TMP13:%.*]] = or i1 [[TMP4]], [[TMP8]]
-; PRED-NEXT: [[TMP14:%.*]] = or i1 [[TMP13]], [[TMP12]]
-; PRED-NEXT: br i1 [[TMP14]], label %[[SCALAR_PH:.*]], label %[[VECTOR_PH:.*]]
+; PRED-NEXT: br label %[[VECTOR_PH:.*]]
; PRED: [[VECTOR_PH]]:
; PRED-NEXT: [[TMP15:%.*]] = sub i64 [[TMP0]], 8
; PRED-NEXT: [[TMP16:%.*]] = icmp ugt i64 [[TMP0]], 8
@@ -872,9 +848,9 @@ define void @test_conditional_interleave_group (ptr noalias %src.1, ptr noalias
; PRED-NEXT: [[ACTIVE_LANE_MASK_ENTRY:%.*]] = call <8 x i1> @llvm.get.active.lane.mask.v8i1.i64(i64 0, i64 [[TMP0]])
; PRED-NEXT: br label %[[VECTOR_BODY:.*]]
; PRED: [[VECTOR_BODY]]:
-; PRED-NEXT: [[INDEX:%.*]] = phi i64 [ 0, %[[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], %[[PRED_STORE_CONTINUE27:.*]] ]
-; PRED-NEXT: [[ACTIVE_LANE_MASK:%.*]] = phi <8 x i1> [ [[ACTIVE_LANE_MASK_ENTRY]], %[[VECTOR_PH]] ], [ [[ACTIVE_LANE_MASK_NEXT:%.*]], %[[PRED_STORE_CONTINUE27]] ]
-; PRED-NEXT: [[VEC_IND:%.*]] = phi <8 x i64> [ <i64 0, i64 1, i64 2, i64 3, i64 4, i64 5, i64 6, i64 7>, %[[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.*]], %[[PRED_STORE_CONTINUE27]] ]
+; PRED-NEXT: [[INDEX:%.*]] = phi i64 [ 0, %[[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], %[[PRED_STORE_CONTINUE20:.*]] ]
+; PRED-NEXT: [[ACTIVE_LANE_MASK:%.*]] = phi <8 x i1> [ [[ACTIVE_LANE_MASK_ENTRY]], %[[VECTOR_PH]] ], [ [[ACTIVE_LANE_MASK_NEXT:%.*]], %[[PRED_STORE_CONTINUE20]] ]
+; PRED-NEXT: [[VEC_IND:%.*]] = phi <8 x i64> [ <i64 0, i64 1, i64 2, i64 3, i64 4, i64 5, i64 6, i64 7>, %[[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.*]], %[[PRED_STORE_CONTINUE20]] ]
; PRED-NEXT: [[TMP18:%.*]] = load float, ptr [[SRC_1]], align 4
; PRED-NEXT: [[BROADCAST_SPLATINSERT8:%.*]] = insertelement <8 x float> poison, float [[TMP18]], i64 0
; PRED-NEXT: [[BROADCAST_SPLAT9:%.*]] = shufflevector <8 x float> [[BROADCAST_SPLATINSERT8]], <8 x float> poison, <8 x i32> zeroinitializer
@@ -909,8 +885,8 @@ define void @test_conditional_interleave_group (ptr noalias %src.1, ptr noalias
; PRED-NEXT: br label %[[PRED_STORE_CONTINUE]]
; PRED: [[PRED_STORE_CONTINUE]]:
; PRED-NEXT: [[TMP35:%.*]] = extractelement <8 x i1> [[TMP26]], i32 1
-; PRED-NEXT: br i1 [[TMP35]], label %[[PRED_STORE_IF14:.*]], label %[[PRED_STORE_CONTINUE15:.*]]
-; PRED: [[PRED_STORE_IF14]]:
+; PRED-NEXT: br i1 [[TMP35]], label %[[PRED_STORE_IF7:.*]], label %[[PRED_STORE_CONTINUE8:.*]]
+; PRED: [[PRED_STORE_IF7]]:
; PRED-NEXT: [[TMP36:%.*]] = extractelement <8 x ptr> [[TMP27]], i32 1
; PRED-NEXT: store float 0.000000e+00, ptr [[TMP36]], align 4
; PRED-NEXT: [[TMP37:%.*]] = extractelement <8 x ptr> [[TMP27]], i32 1
@@ -921,11 +897,11 @@ define void @test_conditional_interleave_group (ptr noalias %src.1, ptr noalias
; PRED-NEXT: store float 0.000000e+00, ptr [[TMP40]], align 4
; PRED-NEXT: [[TMP41:%.*]] = extractelement <8 x ptr> [[TMP27]], i32 1
; PRED-NEXT: store float 0.000000e+00, ptr [[TMP41]], align 4
-; PRED-NEXT: br label %[[PRED_STORE_CONTINUE15]]
-; PRED: [[PRED_STORE_CONTINUE15]]:
+; PRED-NEXT: br label %[[PRED_STORE_CONTINUE8]]
+; PRED: [[PRED_STORE_CONTINUE8]]:
; PRED-NEXT: [[TMP42:%.*]] = extractelement <8 x i1> [[TMP26]], i32 2
-; PRED-NEXT: br i1 [[TMP42]], label %[[PRED_STORE_IF16:.*]], label %[[PRED_STORE_CONTINUE17:.*]]
-; PRED: [[PRED_STORE_IF16]]:
+; PRED-NEXT: br i1 [[TMP42]], label %[[PRED_STORE_IF9:.*]], label %[[PRED_STORE_CONTINUE10:.*]]
+; PRED: [[PRED_STORE_IF9]]:
; PRED-NEXT: [[TMP43:%.*]] = extractelement <8 x ptr> [[TMP27]], i32 2
; PRED-NEXT: store float 0.000000e+00, ptr [[TMP43]], align 4
; PRED-NEXT: [[TMP44:%.*]] = extractelement <8 x ptr> [[TMP27]], i32 2
@@ -936,11 +912,11 @@ define void @test_conditional_interleave_group (ptr noalias %src.1, ptr noalias
; PRED-NEXT: store float 0.000000e+00, ptr [[TMP47]], align 4
; PRED-NEXT: [[TMP48:%.*]] = extractelement <8 x ptr> [[TMP27]], i32 2
; PRED-NEXT: store float 0.000000e+00, ptr [[TMP48]], align 4
-; PRED-NEXT: br label %[[PRED_STORE_CONTINUE17]]
-; PRED: [[PRED_STORE_CONTINUE17]]:
+; PRED-NEXT: br label %[[PRED_STORE_CONTINUE10]]
+; PRED: [[PRED_STORE_CONTINUE10]]:
; PRED-NEXT: [[TMP49:%.*]] = extractelement <8 x i1> [[TMP26]], i32 3
-; PRED-NEXT: br i1 [[TMP49]], label %[[PRED_STORE_IF18:.*]], label %[[PRED_STORE_CONTINUE19:.*]]
-; PRED: [[PRED_STORE_IF18]]:
+; PRED-NEXT: br i1 [[TMP49]], label %[[PRED_STORE_IF11:.*]], label %[[PRED_STORE_CONTINUE12:.*]]
+; PRED: [[PRED_STORE_IF11]]:
; PRED-NEXT: [[TMP50:%.*]] = extractelement <8 x ptr> [[TMP27]], i32 3
; PRED-NEXT: store float 0.000000e+00, ptr [[TMP50]], align 4
; PRED-NEXT: [[TMP51:%.*]] = extractelement <8 x ptr> [[TMP27]], i32 3
@@ -951,11 +927,11 @@ define void @test_conditional_interleave_group (ptr noalias %src.1, ptr noalias
; PRED-NEXT: store float 0.000000e+00, ptr [[TMP54]], align 4
; PRED-NEXT: [[TMP55:%.*]] = extractelement <8 x ptr> [[TMP27]], i32 3
; PRED-NEXT: store float 0.000000e+00, ptr [[TMP55]], align 4
-; PRED-NEXT: br label %[[PRED_STORE_CONTINUE19]]
-; PRED: [[PRED_STORE_CONTINUE19]]:
+; PRED-NEXT: br label %[[PRED_STORE_CONTINUE12]]
+; PRED: [[PRED_STORE_CONTINUE12]]:
; PRED-NEXT: [[TMP56:%.*]] = extractelement <8 x i1> [[TMP26]], i32 4
-; PRED-NEXT: br i1 [[TMP56]], label %[[PRED_STORE_IF20:.*]], label %[[PRED_STORE_CONTINUE21:.*]]
-; PRED: [[PRED_STORE_IF20]]:
+; PRED-NEXT: br i1 [[TMP56]], label %[[PRED_STORE_IF13:.*]], label %[[PRED_STORE_CONTINUE14:.*]]
+; PRED: [[PRED_STORE_IF13]]:
; PRED-NEXT: [[TMP57:%.*]] = extractelement <8 x ptr> [[TMP27]], i32 4
; PRED-NEXT: store float 0.000000e+00, ptr [[TMP57]], align 4
; PRED-NEXT: [[TMP58:%.*]] = extractelement <8 x ptr> [[TMP27]], i32 4
@@ -966,11 +942,11 @@ define void @test_conditional_interleave_group (ptr noalias %src.1, ptr noalias
; PRED-NEXT: store float 0.000000e+00, ptr [[TMP61]], align 4
; PRED-NEXT: [[TMP62:%.*]] = extractelement <8 x ptr> [[TMP27]], i32 4
; PRED-NEXT: store float 0.000000e+00, ptr [[TMP62]], align 4
-; PRED-NEXT: br label %[[PRED_STORE_CONTINUE21]]
-; PRED: [[PRED_STORE_CONTINUE21]]:
+; PRED-NEXT: br label %[[PRED_STORE_CONTINUE14]]
+; PRED: [[PRED_STORE_CONTINUE14]]:
; PRED-NEXT: [[TMP63:%.*]] = extractelement <8 x i1> [[TMP26]], i32 5
-; PRED-NEXT: br i1 [[TMP63]], label %[[PRED_STORE_IF22:.*]], label %[[PRED_STORE_CONTINUE23:.*]]
-; PRED: [[PRED_STORE_IF22]]:
+; PRED-NEXT: br i1 [[TMP63]], label %[[PRED_STORE_IF15:.*]], label %[[PRED_STORE_CONTINUE16:.*]]
+; PRED: [[PRED_STORE_IF15]]:
; PRED-NEXT: [[TMP64:%.*]] = extractelement <8 x ptr> [[TMP27]], i32 5
; PRED-NEXT: store float 0.000000e+00, ptr [[TMP64]], align 4
; PRED-NEXT: [[TMP65:%.*]] = extractelement <8 x ptr> [[TMP27]], i32 5
@@ -981,11 +957,11 @@ define void @test_conditional_interleave_group (ptr noalias %src.1, ptr noalias
; PRED-NEXT: store float 0.000000e+00, ptr [[TMP68]], align 4
; PRED-NEXT: [[TMP69:%.*]] = extractelement <8 x ptr> [[TMP27]], i32 5
; PRED-NEXT: store float 0.000000e+00, ptr [[TMP69]], align 4
-; PRED-NEXT: br label %[[PRED_STORE_CONTINUE23]]
-; PRED: [[PRED_STORE_CONTINUE23]]:
+; PRED-NEXT: br label %[[PRED_STORE_CONTINUE16]]
+; PRED: [[PRED_STORE_CONTINUE16]]:
; PRED-NEXT: [[TMP70:%.*]] = extractelement <8 x i1> [[TMP26]], i32 6
-; PRED-NEXT: br i1 [[TMP70]], label %[[PRED_STORE_IF24:.*]], label %[[PRED_STORE_CONTINUE25:.*]]
-; PRED: [[PRED_STORE_IF24]]:
+; PRED-NEXT: br i1 [[TMP70]], label %[[PRED_STORE_IF17:.*]], label %[[PRED_STORE_CONTINUE18:.*]]
+; PRED: [[PRED_STORE_IF17]]:
; PRED-NEXT: [[TMP71:%.*]] = extractelement <8 x ptr> [[TMP27]], i32 6
; PRED-NEXT: store float 0.000000e+00, ptr [[TMP71]], align 4
; PRED-NEXT: [[TMP72:%.*]] = extractelement <8 x ptr> [[TMP27]], i32 6
@@ -996,11 +972,11 @@ define void @test_conditional_interleave_group (ptr noalias %src.1, ptr noalias
; PRED-NEXT: store float 0.000000e+00, ptr [[TMP75]], align 4
; PRED-NEXT: [[TMP76:%.*]] = extractelement <8 x ptr> [[TMP27]], i32 6
; PRED-NEXT: store float 0.000000e+00, ptr [[TMP76]], align 4
-; PRED-NEXT: br label %[[PRED_STORE_CONTINUE25]]
-; PRED: [[PRED_STORE_CONTINUE25]]:
+; PRED-NEXT: br label %[[PRED_STORE_CONTINUE18]]
+; PRED: [[PRED_STORE_CONTINUE18]]:
; PRED-NEXT: [[TMP77:%.*]] = extractelement <8 x i1> [[TMP26]], i32 7
-; PRED-NEXT: br i1 [[TMP77]], label %[[PRED_STORE_IF26:.*]], label %[[PRED_STORE_CONTINUE27]]
-; PRED: [[PRED_STORE_IF26]]:
+; PRED-NEXT: br i1 [[TMP77]], label %[[PRED_STORE_IF19:.*]], label %[[PRED_STORE_CONTINUE20]]
+; PRED: [[PRED_STORE_IF19]]:
; PRED-NEXT: [[TMP78:%.*]] = extractelement <8 x ptr> [[TMP27]], i32 7
; PRED-NEXT: store float 0.000000e+00, ptr [[TMP78]], align 4
; PRED-NEXT: [[TMP79:%.*]] = extractelement <8 x ptr> [[TMP27]], i32 7
@@ -1011,8 +987,8 @@ define void @test_conditional_interleave_group (ptr noalias %src.1, ptr noalias
; PRED-NEXT: store float 0.000000e+00, ptr [[TMP82]], align 4
; PRED-NEXT: [[TMP83:%.*]] = extractelement <8 x ptr> [[TMP27]], i32 7
; PRED-NEXT: store float 0.000000e+00, ptr [[TMP83]], align 4
-; PRED-NEXT: br label %[[PRED_STORE_CONTINUE27]]
-; PRED: [[PRED_STORE_CONTINUE27]]:
+; PRED-NEXT: br label %[[PRED_STORE_CONTINUE20]]
+; PRED: [[PRED_STORE_CONTINUE20]]:
; PRED-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 8
; PRED-NEXT: [[ACTIVE_LANE_MASK_NEXT]] = call <8 x i1> @llvm.get.active.lane.mask.v8i1.i64(i64 [[INDEX]], i64 [[TMP17]])
; PRED-NEXT: [[TMP84:%.*]] = extractelement <8 x i1> [[ACTIVE_LANE_MASK_NEXT]], i32 0
@@ -1020,8 +996,9 @@ define void @test_conditional_interleave_group (ptr noalias %src.1, ptr noalias
; PRED-NEXT: [[VEC_IND_NEXT]] = add <8 x i64> [[VEC_IND]], splat (i64 8)
; PRED-NEXT: br i1 [[TMP85]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]]
; PRED: [[MIDDLE_BLOCK]]:
-; PRED-NEXT: br [[EXIT:label %.*]]
-; PRED: [[SCALAR_PH]]:
+; PRED-NEXT: br label %[[EXIT:.*]]
+; PRED: [[EXIT]]:
+; PRED-NEXT: ret void
;
entry:
br label %loop.header
@@ -1124,7 +1101,7 @@ define void @redundant_branch_and_tail_folding(ptr %dst, i1 %c) {
; PRED-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
; PRED-NEXT: [[VEC_IND_NEXT]] = add <4 x i64> [[VEC_IND]], splat (i64 4)
; PRED-NEXT: [[TMP11:%.*]] = icmp eq i64 [[INDEX_NEXT]], 24
-; PRED-NEXT: br i1 [[TMP11]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP6:![0-9]+]]
+; PRED-NEXT: br i1 [[TMP11]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP5:![0-9]+]]
; PRED: [[MIDDLE_BLOCK]]:
; PRED-NEXT: br label %[[EXIT:.*]]
; PRED: [[EXIT]]:
@@ -1313,7 +1290,7 @@ define void @pred_udiv_select_cost(ptr %A, ptr %B, ptr %C, i64 %n, i8 %y) #1 {
; PRED-NEXT: [[ACTIVE_LANE_MASK_NEXT]] = call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i64(i64 [[INDEX]], i64 [[TMP11]])
; PRED-NEXT: [[TMP28:%.*]] = extractelement <vscale x 16 x i1> [[ACTIVE_LANE_MASK_NEXT]], i32 0
; PRED-NEXT: [[TMP29:%.*]] = xor i1 [[TMP28]], true
-; PRED-NEXT: br i1 [[TMP29]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP7:![0-9]+]]
+; PRED-NEXT: br i1 [[TMP29]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP6:![0-9]+]]
; PRED: [[MIDDLE_BLOCK]]:
; PRED-NEXT: br [[EXIT:label %.*]]
; PRED: [[SCALAR_PH]]:
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/divs-with-scalable-vfs.ll b/llvm/test/Transforms/LoopVectorize/AArch64/divs-with-scalable-vfs.ll
index 72e813b62025f..23612191c7b9a 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/divs-with-scalable-vfs.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/divs-with-scalable-vfs.ll
@@ -97,14 +97,7 @@ define void @sdiv_feeding_gep_predicated(ptr %dst, i32 %x, i64 %M, i64 %conv6, i
; CHECK-SAME: ptr [[DST:%.*]], i32 [[X:%.*]], i64 [[M:%.*]], i64 [[CONV6:%.*]], i64 [[N:%.*]]) #[[ATTR0]] {
; CHECK-NEXT: [[ENTRY:.*:]]
; CHECK-NEXT: [[CONV61:%.*]] = zext i32 [[X]] to i64
-; CHECK-NEXT: br label %[[VECTOR_SCEVCHECK:.*]]
-; CHECK: [[VECTOR_SCEVCHECK]]:
-; CHECK-NEXT: [[TMP0:%.*]] = add i64 [[N]], -1
-; CHECK-NEXT: [[TMP1:%.*]] = trunc i64 [[TMP0]] to i32
-; CHECK-NEXT: [[TMP2:%.*]] = icmp slt i32 [[TMP1]], 0
-; CHECK-NEXT: [[TMP3:%.*]] = icmp ugt i64 [[TMP0]], 4294967295
-; CHECK-NEXT: [[TMP4:%.*]] = or i1 [[TMP2]], [[TMP3]]
-; CHECK-NEXT: br i1 [[TMP4]], label %[[SCALAR_PH:.*]], label %[[VECTOR_PH:.*]]
+; CHECK-NEXT: br label %[[VECTOR_PH:.*]]
; CHECK: [[VECTOR_PH]]:
; CHECK-NEXT: [[TMP5:%.*]] = call i64 @llvm.vscale.i64()
; CHECK-NEXT: [[TMP6:%.*]] = mul nuw i64 [[TMP5]], 2
@@ -148,28 +141,6 @@ define void @sdiv_feeding_gep_predicated(ptr %dst, i32 %x, i64 %M, i64 %conv6, i
; CHECK-NEXT: br i1 [[TMP36]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]]
; CHECK: [[MIDDLE_BLOCK]]:
; CHECK-NEXT: br label %[[EXIT:.*]]
-; CHECK: [[SCALAR_PH]]:
-; CHECK-NEXT: br label %[[LOOP:.*]]
-; CHECK: [[LOOP]]:
-; CHECK-NEXT: [[IV:%.*]] = phi i64 [ 0, %[[SCALAR_PH]] ], [ [[IV_NEXT:%.*]], %[[LOOP_LATCH:.*]] ]
-; CHECK-NEXT: [[C:%.*]] = icmp ule i64 [[IV]], [[M]]
-; CHECK-NEXT: br i1 [[C]], label %[[THEN:.*]], label %[[LOOP_LATCH]]
-; CHECK: [[THEN]]:
-; CHECK-NEXT: [[DIV18:%.*]] = sdiv i64 [[M]], [[CONV6]]
-; CHECK-NEXT: [[CONV20:%.*]] = trunc i64 [[DIV18]] to i32
-; CHECK-NEXT: [[MUL30:%.*]] = mul i64 [[DIV18]], [[CONV61]]
-; CHECK-NEXT: [[SUB31:%.*]] = sub i64 [[IV]], [[MUL30]]
-; CHECK-NEXT: [[CONV34:%.*]] = trunc i64 [[SUB31]] to i32
-; CHECK-NEXT: [[MUL35:%.*]] = mul i32 [[X]], [[CONV20]]
-; CHECK-NEXT: [[ADD36:%.*]] = add i32 [[MUL35]], [[CONV34]]
-; CHECK-NEXT: [[IDXPROM:%.*]] = sext i32 [[ADD36]] to i64
-; CHECK-NEXT: [[GEP:%.*]] = getelementptr double, ptr [[DST]], i64 [[IDXPROM]]
-; CHECK-NEXT: store double 0.000000e+00, ptr [[GEP]], align 8
-; CHECK-NEXT: br label %[[LOOP_LATCH]]
-; CHECK: [[LOOP_LATCH]]:
-; CHECK-NEXT: [[IV_NEXT]] = add i64 [[IV]], 1
-; CHECK-NEXT: [[EC:%.*]] = icmp eq i64 [[IV_NEXT]], [[N]]
-; CHECK-NEXT: br i1 [[EC]], label %[[EXIT]], label %[[LOOP]], !llvm.loop [[LOOP5:![0-9]+]]
; CHECK: [[EXIT]]:
; CHECK-NEXT: ret void
;
@@ -212,13 +183,7 @@ define void @udiv_urem_feeding_gep(i64 %x, ptr %dst, i64 %N) {
; CHECK-NEXT: [[MUL_1_I:%.*]] = mul i64 [[X]], [[X]]
; CHECK-NEXT: [[MUL_2_I:%.*]] = mul i64 [[MUL_1_I]], [[X]]
; CHECK-NEXT: [[TMP0:%.*]] = add i64 [[N]], 1
-; CHECK-NEXT: br label %[[VECTOR_SCEVCHECK:.*]]
-; CHECK: [[VECTOR_SCEVCHECK]]:
-; CHECK-NEXT: [[TMP1:%.*]] = trunc i64 [[N]] to i32
-; CHECK-NEXT: [[TMP2:%.*]] = icmp slt i32 [[TMP1]], 0
-; CHECK-NEXT: [[TMP3:%.*]] = icmp ugt i64 [[N]], 4294967295
-; CHECK-NEXT: [[TMP4:%.*]] = or i1 [[TMP2]], [[TMP3]]
-; CHECK-NEXT: br i1 [[TMP4]], label %[[SCALAR_PH:.*]], label %[[VECTOR_PH:.*]]
+; CHECK-NEXT: br label %[[VECTOR_PH:.*]]
; CHECK: [[VECTOR_PH]]:
; CHECK-NEXT: [[TMP5:%.*]] = call i64 @llvm.vscale.i64()
; CHECK-NEXT: [[TMP6:%.*]] = mul nuw i64 [[TMP5]], 2
@@ -262,32 +227,9 @@ define void @udiv_urem_feeding_gep(i64 %x, ptr %dst, i64 %N) {
; CHECK-NEXT: [[TMP39:%.*]] = extractelement <vscale x 2 x i1> [[ACTIVE_LANE_MASK_NEXT]], i32 0
; CHECK-NEXT: [[TMP40:%.*]] = xor i1 [[TMP39]], true
; CHECK-NEXT: [[VEC_IND_NEXT]] = add <vscale x 2 x i64> [[VEC_IND]], [[BROADCAST_SPLAT4]]
-; CHECK-NEXT: br i1 [[TMP40]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP6:![0-9]+]]
+; CHECK-NEXT: br i1 [[TMP40]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP5:![0-9]+]]
; CHECK: [[MIDDLE_BLOCK]]:
; CHECK-NEXT: br label %[[EXIT:.*]]
-; CHECK: [[SCALAR_PH]]:
-...
[truncated]
|
|
@llvm/pr-subscribers-llvm-transforms Author: Ramkumar Ramachandra (artagnon) ChangesWhen using an active lane mask, all SCEVChecks except those for stride-versioning are redundant. Patch is 35.71 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/171605.diff 5 Files Affected:
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index 15d0fa41bd902..96a567307063b 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -1819,7 +1819,8 @@ class GeneratedRTChecks {
/// there is no vector code generation, the check blocks are removed
/// completely.
void create(Loop *L, const LoopAccessInfo &LAI,
- const SCEVPredicate &UnionPred, ElementCount VF, unsigned IC) {
+ const SCEVPredicate &UnionPred, ElementCount VF, unsigned IC,
+ bool UseActiveLaneMask) {
// Hard cutoff to limit compile-time increase in case a very large number of
// runtime checks needs to be generated.
@@ -1837,7 +1838,8 @@ class GeneratedRTChecks {
// ensure the blocks are properly added to LoopInfo & DominatorTree. Those
// may be used by SCEVExpander. The blocks will be un-linked from their
// predecessors and removed from LI & DT at the end of the function.
- if (!UnionPred.isAlwaysTrue()) {
+ if (!UnionPred.isAlwaysTrue() &&
+ (!UseActiveLaneMask || !LAI.getSymbolicStrides().empty())) {
SCEVCheckBlock = SplitBlock(Preheader, Preheader->getTerminator(), DT, LI,
nullptr, "vector.scevcheck");
@@ -10094,7 +10096,8 @@ bool LoopVectorizePass::processLoop(Loop *L) {
// Optimistically generate runtime checks if they are needed. Drop them if
// they turn out to not be profitable.
if (VF.Width.isVector() || SelectedIC > 1) {
- Checks.create(L, *LVL.getLAI(), PSE.getPredicate(), VF.Width, SelectedIC);
+ Checks.create(L, *LVL.getLAI(), PSE.getPredicate(), VF.Width, SelectedIC,
+ useActiveLaneMask(CM.getTailFoldingStyle()));
// Bail out early if either the SCEV or memory runtime checks are known to
// fail. In that case, the vector loop would never execute.
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/conditional-branches-cost.ll b/llvm/test/Transforms/LoopVectorize/AArch64/conditional-branches-cost.ll
index b549a06f08f8c..d695dc51907f5 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/conditional-branches-cost.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/conditional-branches-cost.ll
@@ -840,31 +840,7 @@ define void @test_conditional_interleave_group (ptr noalias %src.1, ptr noalias
; PRED-SAME: ptr noalias [[SRC_1:%.*]], ptr noalias [[SRC_2:%.*]], ptr noalias [[SRC_3:%.*]], ptr noalias [[SRC_4:%.*]], ptr noalias [[DST:%.*]], i64 [[N:%.*]]) #[[ATTR3:[0-9]+]] {
; PRED-NEXT: [[ENTRY:.*:]]
; PRED-NEXT: [[TMP0:%.*]] = add i64 [[N]], 1
-; PRED-NEXT: br label %[[VECTOR_SCEVCHECK:.*]]
-; PRED: [[VECTOR_SCEVCHECK]]:
-; PRED-NEXT: [[MUL:%.*]] = call { i64, i1 } @llvm.umul.with.overflow.i64(i64 16, i64 [[N]])
-; PRED-NEXT: [[MUL_RESULT:%.*]] = extractvalue { i64, i1 } [[MUL]], 0
-; PRED-NEXT: [[MUL_OVERFLOW:%.*]] = extractvalue { i64, i1 } [[MUL]], 1
-; PRED-NEXT: [[TMP2:%.*]] = getelementptr i8, ptr [[DST]], i64 [[MUL_RESULT]]
-; PRED-NEXT: [[TMP3:%.*]] = icmp ult ptr [[TMP2]], [[DST]]
-; PRED-NEXT: [[TMP4:%.*]] = or i1 [[TMP3]], [[MUL_OVERFLOW]]
-; PRED-NEXT: [[SCEVGEP:%.*]] = getelementptr i8, ptr [[DST]], i64 4
-; PRED-NEXT: [[MUL1:%.*]] = call { i64, i1 } @llvm.umul.with.overflow.i64(i64 16, i64 [[N]])
-; PRED-NEXT: [[MUL_RESULT2:%.*]] = extractvalue { i64, i1 } [[MUL1]], 0
-; PRED-NEXT: [[MUL_OVERFLOW3:%.*]] = extractvalue { i64, i1 } [[MUL1]], 1
-; PRED-NEXT: [[TMP6:%.*]] = getelementptr i8, ptr [[SCEVGEP]], i64 [[MUL_RESULT2]]
-; PRED-NEXT: [[TMP7:%.*]] = icmp ult ptr [[TMP6]], [[SCEVGEP]]
-; PRED-NEXT: [[TMP8:%.*]] = or i1 [[TMP7]], [[MUL_OVERFLOW3]]
-; PRED-NEXT: [[SCEVGEP4:%.*]] = getelementptr i8, ptr [[DST]], i64 8
-; PRED-NEXT: [[MUL5:%.*]] = call { i64, i1 } @llvm.umul.with.overflow.i64(i64 16, i64 [[N]])
-; PRED-NEXT: [[MUL_RESULT6:%.*]] = extractvalue { i64, i1 } [[MUL5]], 0
-; PRED-NEXT: [[MUL_OVERFLOW7:%.*]] = extractvalue { i64, i1 } [[MUL5]], 1
-; PRED-NEXT: [[TMP10:%.*]] = getelementptr i8, ptr [[SCEVGEP4]], i64 [[MUL_RESULT6]]
-; PRED-NEXT: [[TMP11:%.*]] = icmp ult ptr [[TMP10]], [[SCEVGEP4]]
-; PRED-NEXT: [[TMP12:%.*]] = or i1 [[TMP11]], [[MUL_OVERFLOW7]]
-; PRED-NEXT: [[TMP13:%.*]] = or i1 [[TMP4]], [[TMP8]]
-; PRED-NEXT: [[TMP14:%.*]] = or i1 [[TMP13]], [[TMP12]]
-; PRED-NEXT: br i1 [[TMP14]], label %[[SCALAR_PH:.*]], label %[[VECTOR_PH:.*]]
+; PRED-NEXT: br label %[[VECTOR_PH:.*]]
; PRED: [[VECTOR_PH]]:
; PRED-NEXT: [[TMP15:%.*]] = sub i64 [[TMP0]], 8
; PRED-NEXT: [[TMP16:%.*]] = icmp ugt i64 [[TMP0]], 8
@@ -872,9 +848,9 @@ define void @test_conditional_interleave_group (ptr noalias %src.1, ptr noalias
; PRED-NEXT: [[ACTIVE_LANE_MASK_ENTRY:%.*]] = call <8 x i1> @llvm.get.active.lane.mask.v8i1.i64(i64 0, i64 [[TMP0]])
; PRED-NEXT: br label %[[VECTOR_BODY:.*]]
; PRED: [[VECTOR_BODY]]:
-; PRED-NEXT: [[INDEX:%.*]] = phi i64 [ 0, %[[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], %[[PRED_STORE_CONTINUE27:.*]] ]
-; PRED-NEXT: [[ACTIVE_LANE_MASK:%.*]] = phi <8 x i1> [ [[ACTIVE_LANE_MASK_ENTRY]], %[[VECTOR_PH]] ], [ [[ACTIVE_LANE_MASK_NEXT:%.*]], %[[PRED_STORE_CONTINUE27]] ]
-; PRED-NEXT: [[VEC_IND:%.*]] = phi <8 x i64> [ <i64 0, i64 1, i64 2, i64 3, i64 4, i64 5, i64 6, i64 7>, %[[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.*]], %[[PRED_STORE_CONTINUE27]] ]
+; PRED-NEXT: [[INDEX:%.*]] = phi i64 [ 0, %[[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], %[[PRED_STORE_CONTINUE20:.*]] ]
+; PRED-NEXT: [[ACTIVE_LANE_MASK:%.*]] = phi <8 x i1> [ [[ACTIVE_LANE_MASK_ENTRY]], %[[VECTOR_PH]] ], [ [[ACTIVE_LANE_MASK_NEXT:%.*]], %[[PRED_STORE_CONTINUE20]] ]
+; PRED-NEXT: [[VEC_IND:%.*]] = phi <8 x i64> [ <i64 0, i64 1, i64 2, i64 3, i64 4, i64 5, i64 6, i64 7>, %[[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.*]], %[[PRED_STORE_CONTINUE20]] ]
; PRED-NEXT: [[TMP18:%.*]] = load float, ptr [[SRC_1]], align 4
; PRED-NEXT: [[BROADCAST_SPLATINSERT8:%.*]] = insertelement <8 x float> poison, float [[TMP18]], i64 0
; PRED-NEXT: [[BROADCAST_SPLAT9:%.*]] = shufflevector <8 x float> [[BROADCAST_SPLATINSERT8]], <8 x float> poison, <8 x i32> zeroinitializer
@@ -909,8 +885,8 @@ define void @test_conditional_interleave_group (ptr noalias %src.1, ptr noalias
; PRED-NEXT: br label %[[PRED_STORE_CONTINUE]]
; PRED: [[PRED_STORE_CONTINUE]]:
; PRED-NEXT: [[TMP35:%.*]] = extractelement <8 x i1> [[TMP26]], i32 1
-; PRED-NEXT: br i1 [[TMP35]], label %[[PRED_STORE_IF14:.*]], label %[[PRED_STORE_CONTINUE15:.*]]
-; PRED: [[PRED_STORE_IF14]]:
+; PRED-NEXT: br i1 [[TMP35]], label %[[PRED_STORE_IF7:.*]], label %[[PRED_STORE_CONTINUE8:.*]]
+; PRED: [[PRED_STORE_IF7]]:
; PRED-NEXT: [[TMP36:%.*]] = extractelement <8 x ptr> [[TMP27]], i32 1
; PRED-NEXT: store float 0.000000e+00, ptr [[TMP36]], align 4
; PRED-NEXT: [[TMP37:%.*]] = extractelement <8 x ptr> [[TMP27]], i32 1
@@ -921,11 +897,11 @@ define void @test_conditional_interleave_group (ptr noalias %src.1, ptr noalias
; PRED-NEXT: store float 0.000000e+00, ptr [[TMP40]], align 4
; PRED-NEXT: [[TMP41:%.*]] = extractelement <8 x ptr> [[TMP27]], i32 1
; PRED-NEXT: store float 0.000000e+00, ptr [[TMP41]], align 4
-; PRED-NEXT: br label %[[PRED_STORE_CONTINUE15]]
-; PRED: [[PRED_STORE_CONTINUE15]]:
+; PRED-NEXT: br label %[[PRED_STORE_CONTINUE8]]
+; PRED: [[PRED_STORE_CONTINUE8]]:
; PRED-NEXT: [[TMP42:%.*]] = extractelement <8 x i1> [[TMP26]], i32 2
-; PRED-NEXT: br i1 [[TMP42]], label %[[PRED_STORE_IF16:.*]], label %[[PRED_STORE_CONTINUE17:.*]]
-; PRED: [[PRED_STORE_IF16]]:
+; PRED-NEXT: br i1 [[TMP42]], label %[[PRED_STORE_IF9:.*]], label %[[PRED_STORE_CONTINUE10:.*]]
+; PRED: [[PRED_STORE_IF9]]:
; PRED-NEXT: [[TMP43:%.*]] = extractelement <8 x ptr> [[TMP27]], i32 2
; PRED-NEXT: store float 0.000000e+00, ptr [[TMP43]], align 4
; PRED-NEXT: [[TMP44:%.*]] = extractelement <8 x ptr> [[TMP27]], i32 2
@@ -936,11 +912,11 @@ define void @test_conditional_interleave_group (ptr noalias %src.1, ptr noalias
; PRED-NEXT: store float 0.000000e+00, ptr [[TMP47]], align 4
; PRED-NEXT: [[TMP48:%.*]] = extractelement <8 x ptr> [[TMP27]], i32 2
; PRED-NEXT: store float 0.000000e+00, ptr [[TMP48]], align 4
-; PRED-NEXT: br label %[[PRED_STORE_CONTINUE17]]
-; PRED: [[PRED_STORE_CONTINUE17]]:
+; PRED-NEXT: br label %[[PRED_STORE_CONTINUE10]]
+; PRED: [[PRED_STORE_CONTINUE10]]:
; PRED-NEXT: [[TMP49:%.*]] = extractelement <8 x i1> [[TMP26]], i32 3
-; PRED-NEXT: br i1 [[TMP49]], label %[[PRED_STORE_IF18:.*]], label %[[PRED_STORE_CONTINUE19:.*]]
-; PRED: [[PRED_STORE_IF18]]:
+; PRED-NEXT: br i1 [[TMP49]], label %[[PRED_STORE_IF11:.*]], label %[[PRED_STORE_CONTINUE12:.*]]
+; PRED: [[PRED_STORE_IF11]]:
; PRED-NEXT: [[TMP50:%.*]] = extractelement <8 x ptr> [[TMP27]], i32 3
; PRED-NEXT: store float 0.000000e+00, ptr [[TMP50]], align 4
; PRED-NEXT: [[TMP51:%.*]] = extractelement <8 x ptr> [[TMP27]], i32 3
@@ -951,11 +927,11 @@ define void @test_conditional_interleave_group (ptr noalias %src.1, ptr noalias
; PRED-NEXT: store float 0.000000e+00, ptr [[TMP54]], align 4
; PRED-NEXT: [[TMP55:%.*]] = extractelement <8 x ptr> [[TMP27]], i32 3
; PRED-NEXT: store float 0.000000e+00, ptr [[TMP55]], align 4
-; PRED-NEXT: br label %[[PRED_STORE_CONTINUE19]]
-; PRED: [[PRED_STORE_CONTINUE19]]:
+; PRED-NEXT: br label %[[PRED_STORE_CONTINUE12]]
+; PRED: [[PRED_STORE_CONTINUE12]]:
; PRED-NEXT: [[TMP56:%.*]] = extractelement <8 x i1> [[TMP26]], i32 4
-; PRED-NEXT: br i1 [[TMP56]], label %[[PRED_STORE_IF20:.*]], label %[[PRED_STORE_CONTINUE21:.*]]
-; PRED: [[PRED_STORE_IF20]]:
+; PRED-NEXT: br i1 [[TMP56]], label %[[PRED_STORE_IF13:.*]], label %[[PRED_STORE_CONTINUE14:.*]]
+; PRED: [[PRED_STORE_IF13]]:
; PRED-NEXT: [[TMP57:%.*]] = extractelement <8 x ptr> [[TMP27]], i32 4
; PRED-NEXT: store float 0.000000e+00, ptr [[TMP57]], align 4
; PRED-NEXT: [[TMP58:%.*]] = extractelement <8 x ptr> [[TMP27]], i32 4
@@ -966,11 +942,11 @@ define void @test_conditional_interleave_group (ptr noalias %src.1, ptr noalias
; PRED-NEXT: store float 0.000000e+00, ptr [[TMP61]], align 4
; PRED-NEXT: [[TMP62:%.*]] = extractelement <8 x ptr> [[TMP27]], i32 4
; PRED-NEXT: store float 0.000000e+00, ptr [[TMP62]], align 4
-; PRED-NEXT: br label %[[PRED_STORE_CONTINUE21]]
-; PRED: [[PRED_STORE_CONTINUE21]]:
+; PRED-NEXT: br label %[[PRED_STORE_CONTINUE14]]
+; PRED: [[PRED_STORE_CONTINUE14]]:
; PRED-NEXT: [[TMP63:%.*]] = extractelement <8 x i1> [[TMP26]], i32 5
-; PRED-NEXT: br i1 [[TMP63]], label %[[PRED_STORE_IF22:.*]], label %[[PRED_STORE_CONTINUE23:.*]]
-; PRED: [[PRED_STORE_IF22]]:
+; PRED-NEXT: br i1 [[TMP63]], label %[[PRED_STORE_IF15:.*]], label %[[PRED_STORE_CONTINUE16:.*]]
+; PRED: [[PRED_STORE_IF15]]:
; PRED-NEXT: [[TMP64:%.*]] = extractelement <8 x ptr> [[TMP27]], i32 5
; PRED-NEXT: store float 0.000000e+00, ptr [[TMP64]], align 4
; PRED-NEXT: [[TMP65:%.*]] = extractelement <8 x ptr> [[TMP27]], i32 5
@@ -981,11 +957,11 @@ define void @test_conditional_interleave_group (ptr noalias %src.1, ptr noalias
; PRED-NEXT: store float 0.000000e+00, ptr [[TMP68]], align 4
; PRED-NEXT: [[TMP69:%.*]] = extractelement <8 x ptr> [[TMP27]], i32 5
; PRED-NEXT: store float 0.000000e+00, ptr [[TMP69]], align 4
-; PRED-NEXT: br label %[[PRED_STORE_CONTINUE23]]
-; PRED: [[PRED_STORE_CONTINUE23]]:
+; PRED-NEXT: br label %[[PRED_STORE_CONTINUE16]]
+; PRED: [[PRED_STORE_CONTINUE16]]:
; PRED-NEXT: [[TMP70:%.*]] = extractelement <8 x i1> [[TMP26]], i32 6
-; PRED-NEXT: br i1 [[TMP70]], label %[[PRED_STORE_IF24:.*]], label %[[PRED_STORE_CONTINUE25:.*]]
-; PRED: [[PRED_STORE_IF24]]:
+; PRED-NEXT: br i1 [[TMP70]], label %[[PRED_STORE_IF17:.*]], label %[[PRED_STORE_CONTINUE18:.*]]
+; PRED: [[PRED_STORE_IF17]]:
; PRED-NEXT: [[TMP71:%.*]] = extractelement <8 x ptr> [[TMP27]], i32 6
; PRED-NEXT: store float 0.000000e+00, ptr [[TMP71]], align 4
; PRED-NEXT: [[TMP72:%.*]] = extractelement <8 x ptr> [[TMP27]], i32 6
@@ -996,11 +972,11 @@ define void @test_conditional_interleave_group (ptr noalias %src.1, ptr noalias
; PRED-NEXT: store float 0.000000e+00, ptr [[TMP75]], align 4
; PRED-NEXT: [[TMP76:%.*]] = extractelement <8 x ptr> [[TMP27]], i32 6
; PRED-NEXT: store float 0.000000e+00, ptr [[TMP76]], align 4
-; PRED-NEXT: br label %[[PRED_STORE_CONTINUE25]]
-; PRED: [[PRED_STORE_CONTINUE25]]:
+; PRED-NEXT: br label %[[PRED_STORE_CONTINUE18]]
+; PRED: [[PRED_STORE_CONTINUE18]]:
; PRED-NEXT: [[TMP77:%.*]] = extractelement <8 x i1> [[TMP26]], i32 7
-; PRED-NEXT: br i1 [[TMP77]], label %[[PRED_STORE_IF26:.*]], label %[[PRED_STORE_CONTINUE27]]
-; PRED: [[PRED_STORE_IF26]]:
+; PRED-NEXT: br i1 [[TMP77]], label %[[PRED_STORE_IF19:.*]], label %[[PRED_STORE_CONTINUE20]]
+; PRED: [[PRED_STORE_IF19]]:
; PRED-NEXT: [[TMP78:%.*]] = extractelement <8 x ptr> [[TMP27]], i32 7
; PRED-NEXT: store float 0.000000e+00, ptr [[TMP78]], align 4
; PRED-NEXT: [[TMP79:%.*]] = extractelement <8 x ptr> [[TMP27]], i32 7
@@ -1011,8 +987,8 @@ define void @test_conditional_interleave_group (ptr noalias %src.1, ptr noalias
; PRED-NEXT: store float 0.000000e+00, ptr [[TMP82]], align 4
; PRED-NEXT: [[TMP83:%.*]] = extractelement <8 x ptr> [[TMP27]], i32 7
; PRED-NEXT: store float 0.000000e+00, ptr [[TMP83]], align 4
-; PRED-NEXT: br label %[[PRED_STORE_CONTINUE27]]
-; PRED: [[PRED_STORE_CONTINUE27]]:
+; PRED-NEXT: br label %[[PRED_STORE_CONTINUE20]]
+; PRED: [[PRED_STORE_CONTINUE20]]:
; PRED-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 8
; PRED-NEXT: [[ACTIVE_LANE_MASK_NEXT]] = call <8 x i1> @llvm.get.active.lane.mask.v8i1.i64(i64 [[INDEX]], i64 [[TMP17]])
; PRED-NEXT: [[TMP84:%.*]] = extractelement <8 x i1> [[ACTIVE_LANE_MASK_NEXT]], i32 0
@@ -1020,8 +996,9 @@ define void @test_conditional_interleave_group (ptr noalias %src.1, ptr noalias
; PRED-NEXT: [[VEC_IND_NEXT]] = add <8 x i64> [[VEC_IND]], splat (i64 8)
; PRED-NEXT: br i1 [[TMP85]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]]
; PRED: [[MIDDLE_BLOCK]]:
-; PRED-NEXT: br [[EXIT:label %.*]]
-; PRED: [[SCALAR_PH]]:
+; PRED-NEXT: br label %[[EXIT:.*]]
+; PRED: [[EXIT]]:
+; PRED-NEXT: ret void
;
entry:
br label %loop.header
@@ -1124,7 +1101,7 @@ define void @redundant_branch_and_tail_folding(ptr %dst, i1 %c) {
; PRED-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
; PRED-NEXT: [[VEC_IND_NEXT]] = add <4 x i64> [[VEC_IND]], splat (i64 4)
; PRED-NEXT: [[TMP11:%.*]] = icmp eq i64 [[INDEX_NEXT]], 24
-; PRED-NEXT: br i1 [[TMP11]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP6:![0-9]+]]
+; PRED-NEXT: br i1 [[TMP11]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP5:![0-9]+]]
; PRED: [[MIDDLE_BLOCK]]:
; PRED-NEXT: br label %[[EXIT:.*]]
; PRED: [[EXIT]]:
@@ -1313,7 +1290,7 @@ define void @pred_udiv_select_cost(ptr %A, ptr %B, ptr %C, i64 %n, i8 %y) #1 {
; PRED-NEXT: [[ACTIVE_LANE_MASK_NEXT]] = call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i64(i64 [[INDEX]], i64 [[TMP11]])
; PRED-NEXT: [[TMP28:%.*]] = extractelement <vscale x 16 x i1> [[ACTIVE_LANE_MASK_NEXT]], i32 0
; PRED-NEXT: [[TMP29:%.*]] = xor i1 [[TMP28]], true
-; PRED-NEXT: br i1 [[TMP29]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP7:![0-9]+]]
+; PRED-NEXT: br i1 [[TMP29]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP6:![0-9]+]]
; PRED: [[MIDDLE_BLOCK]]:
; PRED-NEXT: br [[EXIT:label %.*]]
; PRED: [[SCALAR_PH]]:
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/divs-with-scalable-vfs.ll b/llvm/test/Transforms/LoopVectorize/AArch64/divs-with-scalable-vfs.ll
index 72e813b62025f..23612191c7b9a 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/divs-with-scalable-vfs.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/divs-with-scalable-vfs.ll
@@ -97,14 +97,7 @@ define void @sdiv_feeding_gep_predicated(ptr %dst, i32 %x, i64 %M, i64 %conv6, i
; CHECK-SAME: ptr [[DST:%.*]], i32 [[X:%.*]], i64 [[M:%.*]], i64 [[CONV6:%.*]], i64 [[N:%.*]]) #[[ATTR0]] {
; CHECK-NEXT: [[ENTRY:.*:]]
; CHECK-NEXT: [[CONV61:%.*]] = zext i32 [[X]] to i64
-; CHECK-NEXT: br label %[[VECTOR_SCEVCHECK:.*]]
-; CHECK: [[VECTOR_SCEVCHECK]]:
-; CHECK-NEXT: [[TMP0:%.*]] = add i64 [[N]], -1
-; CHECK-NEXT: [[TMP1:%.*]] = trunc i64 [[TMP0]] to i32
-; CHECK-NEXT: [[TMP2:%.*]] = icmp slt i32 [[TMP1]], 0
-; CHECK-NEXT: [[TMP3:%.*]] = icmp ugt i64 [[TMP0]], 4294967295
-; CHECK-NEXT: [[TMP4:%.*]] = or i1 [[TMP2]], [[TMP3]]
-; CHECK-NEXT: br i1 [[TMP4]], label %[[SCALAR_PH:.*]], label %[[VECTOR_PH:.*]]
+; CHECK-NEXT: br label %[[VECTOR_PH:.*]]
; CHECK: [[VECTOR_PH]]:
; CHECK-NEXT: [[TMP5:%.*]] = call i64 @llvm.vscale.i64()
; CHECK-NEXT: [[TMP6:%.*]] = mul nuw i64 [[TMP5]], 2
@@ -148,28 +141,6 @@ define void @sdiv_feeding_gep_predicated(ptr %dst, i32 %x, i64 %M, i64 %conv6, i
; CHECK-NEXT: br i1 [[TMP36]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]]
; CHECK: [[MIDDLE_BLOCK]]:
; CHECK-NEXT: br label %[[EXIT:.*]]
-; CHECK: [[SCALAR_PH]]:
-; CHECK-NEXT: br label %[[LOOP:.*]]
-; CHECK: [[LOOP]]:
-; CHECK-NEXT: [[IV:%.*]] = phi i64 [ 0, %[[SCALAR_PH]] ], [ [[IV_NEXT:%.*]], %[[LOOP_LATCH:.*]] ]
-; CHECK-NEXT: [[C:%.*]] = icmp ule i64 [[IV]], [[M]]
-; CHECK-NEXT: br i1 [[C]], label %[[THEN:.*]], label %[[LOOP_LATCH]]
-; CHECK: [[THEN]]:
-; CHECK-NEXT: [[DIV18:%.*]] = sdiv i64 [[M]], [[CONV6]]
-; CHECK-NEXT: [[CONV20:%.*]] = trunc i64 [[DIV18]] to i32
-; CHECK-NEXT: [[MUL30:%.*]] = mul i64 [[DIV18]], [[CONV61]]
-; CHECK-NEXT: [[SUB31:%.*]] = sub i64 [[IV]], [[MUL30]]
-; CHECK-NEXT: [[CONV34:%.*]] = trunc i64 [[SUB31]] to i32
-; CHECK-NEXT: [[MUL35:%.*]] = mul i32 [[X]], [[CONV20]]
-; CHECK-NEXT: [[ADD36:%.*]] = add i32 [[MUL35]], [[CONV34]]
-; CHECK-NEXT: [[IDXPROM:%.*]] = sext i32 [[ADD36]] to i64
-; CHECK-NEXT: [[GEP:%.*]] = getelementptr double, ptr [[DST]], i64 [[IDXPROM]]
-; CHECK-NEXT: store double 0.000000e+00, ptr [[GEP]], align 8
-; CHECK-NEXT: br label %[[LOOP_LATCH]]
-; CHECK: [[LOOP_LATCH]]:
-; CHECK-NEXT: [[IV_NEXT]] = add i64 [[IV]], 1
-; CHECK-NEXT: [[EC:%.*]] = icmp eq i64 [[IV_NEXT]], [[N]]
-; CHECK-NEXT: br i1 [[EC]], label %[[EXIT]], label %[[LOOP]], !llvm.loop [[LOOP5:![0-9]+]]
; CHECK: [[EXIT]]:
; CHECK-NEXT: ret void
;
@@ -212,13 +183,7 @@ define void @udiv_urem_feeding_gep(i64 %x, ptr %dst, i64 %N) {
; CHECK-NEXT: [[MUL_1_I:%.*]] = mul i64 [[X]], [[X]]
; CHECK-NEXT: [[MUL_2_I:%.*]] = mul i64 [[MUL_1_I]], [[X]]
; CHECK-NEXT: [[TMP0:%.*]] = add i64 [[N]], 1
-; CHECK-NEXT: br label %[[VECTOR_SCEVCHECK:.*]]
-; CHECK: [[VECTOR_SCEVCHECK]]:
-; CHECK-NEXT: [[TMP1:%.*]] = trunc i64 [[N]] to i32
-; CHECK-NEXT: [[TMP2:%.*]] = icmp slt i32 [[TMP1]], 0
-; CHECK-NEXT: [[TMP3:%.*]] = icmp ugt i64 [[N]], 4294967295
-; CHECK-NEXT: [[TMP4:%.*]] = or i1 [[TMP2]], [[TMP3]]
-; CHECK-NEXT: br i1 [[TMP4]], label %[[SCALAR_PH:.*]], label %[[VECTOR_PH:.*]]
+; CHECK-NEXT: br label %[[VECTOR_PH:.*]]
; CHECK: [[VECTOR_PH]]:
; CHECK-NEXT: [[TMP5:%.*]] = call i64 @llvm.vscale.i64()
; CHECK-NEXT: [[TMP6:%.*]] = mul nuw i64 [[TMP5]], 2
@@ -262,32 +227,9 @@ define void @udiv_urem_feeding_gep(i64 %x, ptr %dst, i64 %N) {
; CHECK-NEXT: [[TMP39:%.*]] = extractelement <vscale x 2 x i1> [[ACTIVE_LANE_MASK_NEXT]], i32 0
; CHECK-NEXT: [[TMP40:%.*]] = xor i1 [[TMP39]], true
; CHECK-NEXT: [[VEC_IND_NEXT]] = add <vscale x 2 x i64> [[VEC_IND]], [[BROADCAST_SPLAT4]]
-; CHECK-NEXT: br i1 [[TMP40]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP6:![0-9]+]]
+; CHECK-NEXT: br i1 [[TMP40]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP5:![0-9]+]]
; CHECK: [[MIDDLE_BLOCK]]:
; CHECK-NEXT: br label %[[EXIT:.*]]
-; CHECK: [[SCALAR_PH]]:
-...
[truncated]
|
lukel97
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't we need the pointer overflow checks when we have an active lane mask?
MemChecks are still needed, and this patch doesn't change that? |
Oh, you meant in SCEVChecks? Hm, not sure about the patch: I found a couple of cases where it was redundant while developing another patch, and tried to generalize. |
Yes but I'm not familiar with the pointer overflow checks/SCEV checks which this patch removes. How does the active lane mask defend against that? |
I think while the tail-folding style is ActiveLaneMask(WithControlFlow), it inserts these checks (and they're missing in some diffs due to folding)? Sorry, I might be mis-reading VPlanTransforms::addActiveLaneMask? I was reading around: if (useActiveLaneMask(Style)) {
// TODO: Move checks to VPlanTransforms::addActiveLaneMask once
// TailFoldingStyle is visible there.
bool ForControlFlow = useActiveLaneMaskForControlFlow(Style);
bool WithoutRuntimeCheck =
Style == TailFoldingStyle::DataAndControlFlowWithoutRuntimeCheck;
VPlanTransforms::addActiveLaneMask(*Plan, ForControlFlow,
WithoutRuntimeCheck);
}in LV? I think the SCEVChecks amount to checking that the IV update won't overflow, which should be addressed by ALM? |
Only TailFoldingStyle::DataAndControlFlowWithoutRuntimeCheck can skip the overflow checks, I think we'll still need them for the other TailFoldingStyles. |
fhahn
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If possible would be good to proceed step-by-step and try to remove overflow checks in a more targeted fashion so it's easier to reason about.
a8725c0 to
fc6997e
Compare
|
Thanks for the guidance: I'm finding it hard to reason about these checks, which was the reason for the incorrect generalization. I'm still confused about why we need IV update overflow checks when VF is not scalable: I factored out some existing logic and re-used it in the redo of the patch, and I'm not able to reason about it fully. |
We already check when IV update overflow checks are needed in different places in LV: consolidate them into a single routine, and re-use it to conditionally drop SCEVChecks in GeneratedRTChecks.
d2c1598 to
fd62062
Compare
We already check when IV update overflow checks are needed in different places in LV: consolidate them into a single routine, and re-use it to conditionally drop SCEVChecks in GeneratedRTChecks.