Skip to content

[DA] Fix the check between Subscript and Size after delinearization #151326

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Aug 8, 2025
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 24 additions & 9 deletions llvm/lib/Analysis/DependenceAnalysis.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1126,17 +1126,32 @@ bool DependenceInfo::isKnownLessThan(const SCEV *S, const SCEV *Size) const {
Size = SE->getTruncateOrZeroExtend(Size, MaxType);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although this may not be directly related to the change, this might be incorrect, as Size can be negative at this point.


// Special check for addrecs using BE taken count
const SCEV *Bound = SE->getMinusSCEV(S, Size);
if (const SCEVAddRecExpr *AddRec = dyn_cast<SCEVAddRecExpr>(Bound)) {
if (AddRec->isAffine()) {
if (const SCEVAddRecExpr *AddRec = dyn_cast<SCEVAddRecExpr>(S))
if (AddRec->isAffine() && AddRec->hasNoSignedWrap() &&
AddRec->hasNoUnsignedWrap()) {
const SCEV *BECount = SE->getBackedgeTakenCount(AddRec->getLoop());
if (!isa<SCEVCouldNotCompute>(BECount)) {
const SCEV *Limit = AddRec->evaluateAtIteration(BECount, *SE);
if (SE->isKnownNegative(Limit))
return true;
}
const SCEV *Start = AddRec->getStart();
const SCEV *Step = AddRec->getStepRecurrence(*SE);
const SCEV *End = AddRec->evaluateAtIteration(BECount, *SE);
const SCEV *Diff0 = SE->getMinusSCEV(Start, Size);
const SCEV *Diff1 = SE->getMinusSCEV(End, Size);

// If the value of Step is non-negative and the AddRec is non-wrap, it
// reaches its maximum at the last iteration. So it's enouth to check
// whether End - Size is negative.
if (SE->isKnownNonNegative(Step) && SE->isKnownNegative(Diff1))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, it might be sufficient to ensure that the AddRec has nsw (nuw is unnecessary)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isKnownNonNegative assumes a signed representation, so nsw is sufficient.

If it is not nsw, shouldn't these SCEVs be SCEVCouldNotCompute anyway?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is not nsw, shouldn't these SCEVs be SCEVCouldNotCompute anyway?

As I tried with @nonnegative in DADelin.ll, it didn't become SCEVCouldNotCompute.

Anyway, nsw seems sufficient in this case.

return true;

// If the value of Step is non-positive and the AddRec is non-wrap, the
// initial value is its maximum.
if (SE->isKnownNonPositive(Step) && SE->isKnownNegative(Diff0))
return true;

// Even if we don't know the sign of Step, either Start or End must be
// the maximum value of the AddRec since it is non-wrap.
if (SE->isKnownNegative(Diff0) && SE->isKnownNegative(Diff1))
return true;
}
}

// Check using normal isKnownNegative
const SCEV *LimitedBound =
Expand Down
82 changes: 79 additions & 3 deletions llvm/test/Analysis/DependenceAnalysis/DADelin.ll
Original file line number Diff line number Diff line change
Expand Up @@ -594,14 +594,15 @@ for.end12: ; preds = %for.inc10, %entry
}


; FIXME? It seems that we cannot prove that %N is non-negative...
define void @nonnegative(ptr nocapture %A, i32 %N) {
; CHECK-LABEL: 'nonnegative'
; CHECK-NEXT: Src: store i32 1, ptr %arrayidx, align 4 --> Dst: store i32 1, ptr %arrayidx, align 4
; CHECK-NEXT: da analyze - none!
; CHECK-NEXT: da analyze - output [* *]!
; CHECK-NEXT: Src: store i32 1, ptr %arrayidx, align 4 --> Dst: store i32 2, ptr %arrayidx, align 4
; CHECK-NEXT: da analyze - consistent output [0 0|<]!
; CHECK-NEXT: da analyze - output [* *|<]!
; CHECK-NEXT: Src: store i32 2, ptr %arrayidx, align 4 --> Dst: store i32 2, ptr %arrayidx, align 4
; CHECK-NEXT: da analyze - none!
; CHECK-NEXT: da analyze - output [* *]!
Comment on lines +597 to +605
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not confident whether this change is valid...

Copy link
Contributor Author

@kasuga-fj kasuga-fj Jul 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The failure test seems to be the same case as this. If I'm correct,

  • We can reason from nusw that the AddRec of GEP doesn't wrap, as LoopAccessAnalysis does
  • But it doesn't imply each subscript doesn't wrap as well?

So, in this specific case,

  • From nusw, we can assume {{0,+,(4 * %N)}<%for.outer>,+,4}<%for.inner> (the SCEV representation for %add) doesn't wrap
  • It is delinearized as follows
    • Array size: ArrayDecl[UnknownSize][%N]
    • Access: ArrayRef[{0,+,1}<nuw><%for.outer>][{0,+,1}<nuw><%for.inner>]
  • Currently fails to infer {0,+,1}<nuw><%for.inner> is less than %N
    • Can we infer this based on the fact that the original AddRec cannot wrap?
    • Since the backedge taken count of the %for.inner is %N - 1, IIUIC, we can prove the predication if we know that the subscript is nsw?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would have to debug this to find out. Might have to do with isKnownNonNegative etc only working on constants?

At least the change is only more pessimistic, so still correct.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect the original result is incorrect, we cannot prove %N to be non-negative. But I'm pretty sure I missed something...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original result is correct in the sense that %N should be be interpreted as an unsigned integer, i.e. always non-negative. It is compared to %add19 which has an nuw flag which means %i.043 will never have the same value in different iterations (and since %mul and %A are invariant, also %add and %arrayidx).

%N can be larger than INT32_MAX which woud indeed be negative in a signed interpretation. But do we want to only consider signed loop indiction variables?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, what happens when %N is INT32_MIN? When %h.045 is 0, %mul is also 0, and %add has the same value as %i.043.

Since %N is unsigned, its value should be interpreted as UINT32_MAX. But dang, I only considered the inner loop. With getementptr/add you can add any invariant value and the result would is injective (different %h.045 results in different %arrayidx), not that I would assume DA would consider that. inbounds and therefore poison only applies to what getelementptr does (for the sake of the argument, assume that %A is zero in kernel-space in a memory block that spans the entire virtual memory space -- no wrapping behaviour can happen here), but the mul is variant in the outer loop and might wrap / has no no-wrap flag.

So the store 1 -> store 1 self-dependency might be something like output [* <]. Again , not something I expect DA to figure out.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern to the assumption that values are always signed. The only indicator for wrapping behavior are %add16, %add19 which are no-UNSIGNED-wrap. Moving from 0b01111111 to 0b10000000 is valid and must be considered. We cannot implicitly assume the result of it is poison/cannot happen because it is valid.

Every single SE->isKnownNegative assumes a signed interpretation, and it seems not even
https://github.com/kasuga-fj/llvm-project/blob/b06f10d96c6d0fb89253c75c7f1d75c4cf519339/llvm/lib/Analysis/DependenceAnalysis.cpp#L1144
considers that it could be unsigned. But also getBackedgeTakenCount() logically cannot be negative. That is, we always have a mix of unsigned and signed representations. getMinusSCEV() uses two-complement wrapping behavior.

Assuming and i8 SCEV:

getMinusSCEV(129, 0) == 129 == 0b10000001 == -127
isKnownNegative(-127)

so isKnownNegative is happy to interpret an (obviously positive) unsigned number as negtive. Just start the iteration space with the 'negative' number and isKnownNegative returns exactly the opposite of what you would expect

for (uint8_t i = 128; i < 255; ++i)
  A[i];

If we wanted to exploit the fact that memory blocks larger than half the address space might be undefined behavior (due to ptrdiff_t being signed), we would need to sign-extend the SCEV to pointer size. Also, the logic would only be valid for getelementptr argument, not in general. Also, by adding/multiplying a sufficient larger number, the actuall array index may again be within the sweet sport of below half the address space.

I seems doubious to me that < is implemented in terms of getMinusSCEV < 0. The more I think about it the more worried I become. The issue can easily be shown when using constants.

This PR at least adds a check for the AddRec having signed semantics. But AddRec->evaluateAtIteration(BECount) could still be larger than INTn_MAX as above. Then isKnownNegative returns true on Diff1 it even if Size is zero. AddRec->hasNoSignedWrap() just means the AddRec itself does not wrap, but maybe the index that is computed from that AddRec.

I am considering just LGTM since it clearly fixes a bug. Cannot fix all the pre-existing corner cases. Some idea for rigerous/systematic testing would be to use loops with a single iteration only (in two different loops; so AddRec resolves to a constant), or with two iterations, one before the (unsigned and/or signed) wrap and one below. With so just a single iteration it is easier to reason about and SCEV less often has to fall back to SCEVCouldNotCompute. Instead of comparing just the first/last value of the array index SCEV, one might possibly (also) need to recursively visit the SCEV tyo analyze whether each of its constituents is monotonic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I seems doubious to me that < is implemented in terms of getMinusSCEV < 0. The more I think about it the more worried I become. The issue can easily be shown when using constants.

I couldn't agree more. I believe this patch fixes some bugs, but not sure if this is a foundamental solution. And probably the similar bugs exist in other places of DA as well...

With my earlier comment, "all values should be interpreted as signed", what I meant was that we should proceed with the analysis only if we can prove that none of the subscripts overflow in the signed sense. If we encounter a value that can overflow (e.g., transitioning from 0b01111111 to 0b10000000), or if we cannot prove that this doesn't happen, we should bail out early and return something like "unknown", in DA, probably a Dependence object instead of a FullDependence one. It may also be fine to assume that all values are unsigned, but anyway, it would be safer to unify the interpretation used in subsequent processing. Mixing signed and unsigned representations complicates things and, in my opinion, will lead to inconsistencies with the mathematical foundations of the DA algorithm.

Also, I think we need to pay more attention to the SCEV computation (like getAddExpr) within each function (ZIV, SIV, MIV, etc.). At the very least, the calculation on SCEVConstant seems to wrap silently (in two-complement sense) if they are sufficently large. Although it might sound a bit extreme, I'm starting to think we should replace all ScalaEvolution::getAddExpr invocations in DA with something like getAddExprIfNoConstantOverflow (Minus and Mul as well). Once an overflow is detected, the analysis should be terminated immediately. Or, using wider types in such cases would be another solution.

There is one bit of good news; regarding an AddRec used as an argument of GEP, maybe we can reason about wrapping behavior it if the GEP has nusw flag (not the wraps about the during the GEP operation, the AddRec itself!). Looking at this commit message, the same logic seems to apply in DA as well. However, at the same time, I also have the following concerns:

  • It says that the AddRec in the GEP argument doesn't wrap. So, what about subscripts after delinearization? At the moment, I think it would also not wrap if sizes are also positive (signed sense? unsigned sense?).

  • The result of the GEP being poison itself doesn't trigger UB. It will be triggered only when it is used as the pointer argument to a load/store. So, we might need to add some control-flow checks, to rule out cases like this:

      ...
      %gep = getelementptr inbounds i32, ptr %A, i64 %addrec
      %cond = ... ; an expression that always evaluates to false whenever %gep is poison
      br i1 %cond, label %if.then, label %sink
    
    if.then:
      store i32 42, ptr %gep
      br %sink
    
    sink:
      ...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • It says that the AddRec in the GEP argument doesn't wrap. So, what about subscripts after delinearization? At the moment, I think it would also not wrap if sizes are also positive (signed sense? unsigned sense?).

Thats exatly what I mentioned as well. I think "non-wrap" of the subscript has to be replaced with a notion of monotonocity of the SCEV expression.

The result of the GEP being poison itself doesn't trigger UB. It will be triggered only when it is used as the pointer argument to a load/store. So, we might need to add some control-flow checks, to rule out cases like this:

DA does not support loop versioning like LAA does, we cannot require additional checks. I aslo don't think it is even possible to find such an expression to figure out whether GEP results in poison.
Since we can assume that the program we are analyzing has no UB, we can assume that any poison returned by GEP will not be used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • It says that the AddRec in the GEP argument doesn't wrap. So, what about subscripts after delinearization? At the moment, I think it would also not wrap if sizes are also positive (signed sense? unsigned sense?).

Thats exatly what I mentioned as well. I think "non-wrap" of the subscript has to be replaced with a notion of monotonocity of the SCEV expression.

Hm, it looks reasonable to me for now.

The result of the GEP being poison itself doesn't trigger UB. It will be triggered only when it is used as the pointer argument to a load/store. So, we might need to add some control-flow checks, to rule out cases like this:

DA does not support loop versioning like LAA does, we cannot require additional checks. I aslo don't think it is even possible to find such an expression to figure out whether GEP results in poison. Since we can assume that the program we are analyzing has no UB, we can assume that any poison returned by GEP will not be used.

I didn't mean inserting additional checks, I'm considering the case where the input IR has such a condition. DA reasons about something based on the GEP flags (example), but violating these flags doesn't trigger immediate UB. It actually triggered when it is used as, e.g.. pointer argument for load/store. I'm now wondering if this inference is only allowed when the value of the GEP is guaranteed to be used.

;
entry:
%cmp44 = icmp eq i32 %N, 0
Expand Down Expand Up @@ -630,3 +631,78 @@ for.latch:
exit:
ret void
}

; i = 0;
; do {
; a[k * i] = 42;
; a[k * (i + 1)] = 42;
; i++;
; } while (i < k);
;
; The dependency direction between the two stores depends on the sign of k.
; FIXME: Each store has loop-carried dependencies on itself if k is zero.
;
define void @coeff_may_negative(ptr %a, i32 %k) {
; CHECK-LABEL: 'coeff_may_negative'
; CHECK-NEXT: Src: store i8 42, ptr %idx.0, align 1 --> Dst: store i8 42, ptr %idx.0, align 1
; CHECK-NEXT: da analyze - none!
; CHECK-NEXT: Src: store i8 42, ptr %idx.0, align 1 --> Dst: store i8 42, ptr %idx.1, align 1
; CHECK-NEXT: da analyze - consistent output [-1]!
; CHECK-NEXT: Src: store i8 42, ptr %idx.1, align 1 --> Dst: store i8 42, ptr %idx.1, align 1
; CHECK-NEXT: da analyze - none!
;
entry:
br label %loop

loop:
%i = phi i32 [ 0, %entry ], [ %i.next, %loop ]
%i.next = add i32 %i, 1
%subscript.0 = mul i32 %i, %k
%subscript.1 = mul i32 %i.next, %k
%idx.0 = getelementptr i8, ptr %a, i32 %subscript.0
%idx.1 = getelementptr i8, ptr %a, i32 %subscript.1
store i8 42, ptr %idx.0
store i8 42, ptr %idx.1
%cond.exit = icmp eq i32 %i.next, %k
br i1 %cond.exit, label %exit, label %loop

exit:
ret void
}

; i = 0;
; do {
; a[k * i] = 42;
; a[k * (i + 1)] = 42;
; i++;
; } while (i < k);
;
; We can infer that the value of k is non-negative from the nsw flag.
;
define void @coeff_positive(ptr %a, i32 %k) {
; CHECK-LABEL: 'coeff_positive'
; CHECK-NEXT: Src: store i8 42, ptr %idx.0, align 1 --> Dst: store i8 42, ptr %idx.0, align 1
; CHECK-NEXT: da analyze - none!
; CHECK-NEXT: Src: store i8 42, ptr %idx.0, align 1 --> Dst: store i8 42, ptr %idx.1, align 1
; CHECK-NEXT: da analyze - consistent output [-1]!
; CHECK-NEXT: Src: store i8 42, ptr %idx.1, align 1 --> Dst: store i8 42, ptr %idx.1, align 1
; CHECK-NEXT: da analyze - none!
;
entry:
br label %loop

loop:
%i = phi i32 [ 0, %entry ], [ %i.next, %loop ]
%i.next = add nsw i32 %i, 1
%subscript.0 = mul i32 %i, %k
%subscript.1 = mul i32 %i.next, %k
%idx.0 = getelementptr i8, ptr %a, i32 %subscript.0
%idx.1 = getelementptr i8, ptr %a, i32 %subscript.1
store i8 42, ptr %idx.0
store i8 42, ptr %idx.1
%cond.exit = icmp eq i32 %i.next, %k
br i1 %cond.exit, label %exit, label %loop

exit:
ret void
}
Loading