[AMD] enhance range analysis and buffer-op only relies on range-analysis #8372

yangshuxin · 2025-10-05T06:06:21Z

This change fix bugs in range-analysis, and let buffer-ops use the range-analysis result to decide if it's legal to convert memory-op to buffer-ops.

It fixes https://github.com/ROCm/triton-internal/issues/1180 and ROCm#871.

The highlight are following:

Range Analysis
- fix the way to use tl.assume. Previously, it does not consider the control flow relationship between, say tl.assume x > 0 and the location of occurrence of x.
- correct the value range of make_range(begin, end), previous vr is [begin, end], now is [begin, end-1]. Small change in concept incur huge change the regression test.
Buffer-ops

for large tensor (>2G), remove the ad-hoc, and mistaken range-analysis in the pass. It only relies on the result of the range-analysis pass.
previous, buffer-ops pass only check element-index > 0. The right condition is byte-offset in [0, 2G-element-size].
Previous there is a similar work here [AMD] Update BufferOps Non-Negative Check to be in Bytes #7908, contributed by @njriasan . My change to this part is similar but fix some bugs in PR7908 (.e.g. lattice could be nullptr), and update large number of testings. That being said, now that @njriasan made the first change, credit for the part belong to him.

yangshuxin · 2025-10-07T18:12:31Z

I set change the status to "ready for review" despite failures on Nvidia platform. My change is only about AMD platform, I guess my change is not the culprit.

njriasan

I still need to review the range analysis, but this looks good so far!

njriasan · 2025-10-09T22:36:42Z

third_party/amd/lib/TritonAMDGPUTransforms/ConvertToBufferOps.cpp

+  // step 2: Get element type and size.
+  // e.g. addPtrOp.getType is tensor<64x64x!tt.ptr<f16>, then elemTy is
+  // !tt.ptr<f16>, and dereferencing elemTy gets f16.
+  // TODO: Not sure if we need to keep dereferencing in a loop.


My experience was that yes you do need to do this.

njriasan · 2025-10-09T22:37:12Z

third_party/amd/lib/TritonAMDGPUTransforms/ConvertToBufferOps.cpp

+                         << ((szLimit2GB > byteOfst) ? ", out or range"
+                                                     : ",in range"));
+
+  return byteOfst <= szLimit2GB;


This looks great. Thank you!

njriasan · 2025-10-09T22:37:56Z

third_party/amd/test/lib/Analysis/TestAMDRangeAnalysis.cpp

-        solver->load<AMD::TritonIntegerRangeAnalysis>(assumptions);
+        solver->load<AMD::TritonIntegerRangeAnalysis>(
+            assumptions, &getAnalysis<DominanceInfo>(),
+            /*assumeNoArithOverflow=*/true);


Is this a hack or what is the equivalent when a user writes Python?

@njriasan thank you very much for code reviewing, and your initial implementation.

For this specific change, I put a comment to the RangeAnalysis.cpp. The initial motivation can be explained by this contrived example:

a and b were used in an operation indicating it is signed int. c = a + b

We know both a and b in [0, smax). if it will not overflow, the c will fall in [0, smax), otherwise c will be in `(smin, smax). This flag is to tell if arithmetic operation never overflow.

This "optimization" seems to be useful in proving > 0, however, pretty useless for proving < specified value. I decided to ditch this "optimization" in the first place, it might keep it for a while to see if such "optimzation" is useful in the future.

For now, this optimization is turned off. Just turned on for testing. Tuning it on for testing only see only small difference.

I removed this part since both you and Lei are confused.

third_party/amd/lib/TritonAMDGPUTransforms/ConvertToBufferOps.cpp

third_party/amd/lib/Analysis/RangeAnalysis.cpp

The confusing feature is to perform value-range analysis assuming arithmetic op has nsw and nuw flags (even they are not present) e.g. pid * block_size will still fit in [0, smax] despite that the pid itself is in [0, smax]

antiagainst

Thanks for fixing these correctness issues! I'm good with the current impl; we can incrementally work to improve it for perf going forward.

yangshuxin mentioned this pull request Oct 6, 2025

[triton 3.5] [buffer_ops] Failing Pytorch UT: Numerical issues in test_einsum_to_pointwise ROCm/triton#871

Open

yangshuxin force-pushed the shuxin/revamp_value_range branch from c610cac to 8cf91be Compare October 7, 2025 17:38

yangshuxin marked this pull request as ready for review October 7, 2025 18:12

yangshuxin requested review from antiagainst, ptillet and zhanglx13 as code owners October 7, 2025 18:12

yangshuxin changed the title ~~[AMD][DRAFT] revamp range analysis~~ [AMD] revamp range analysis Oct 7, 2025

yangshuxin requested review from CRobeck, Jokeren, fywkevin and peterbell10 as code owners October 9, 2025 18:16

Shuxin Yang and others added 6 commits October 9, 2025 12:52

[AMD][DRAFT] revamp range analysis

d59e350

git format

eed4cae

fix crash

42c9ceb

fix potential bugs and add tests

459f7ab

add tech note; preparing for code review

c418d52

remove yeildop hack

68eff64

yangshuxin force-pushed the shuxin/revamp_value_range branch from 0289d6c to 68eff64 Compare October 9, 2025 19:53

yangshuxin changed the title ~~[AMD] revamp range analysis~~ [AMD] enhance range analysis and buffer-op only relies on range-analysis Oct 9, 2025

njriasan reviewed Oct 9, 2025

View reviewed changes

antiagainst requested changes Oct 11, 2025

View reviewed changes

address code review comment and remove the confusing feature.

9b9d2a3

The confusing feature is to perform value-range analysis assuming arithmetic op has nsw and nuw flags (even they are not present) e.g. pid * block_size will still fit in [0, smax] despite that the pid itself is in [0, smax]

antiagainst approved these changes Oct 12, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AMD] enhance range analysis and buffer-op only relies on range-analysis #8372

[AMD] enhance range analysis and buffer-op only relies on range-analysis #8372

yangshuxin commented Oct 5, 2025 •

edited

Loading

Uh oh!

yangshuxin commented Oct 7, 2025

Uh oh!

njriasan left a comment

Uh oh!

njriasan Oct 9, 2025

Uh oh!

njriasan Oct 9, 2025

Uh oh!

njriasan Oct 9, 2025

Uh oh!

yangshuxin Oct 9, 2025

Uh oh!

yangshuxin Oct 11, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

antiagainst left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[AMD] enhance range analysis and buffer-op only relies on range-analysis #8372

Are you sure you want to change the base?

[AMD] enhance range analysis and buffer-op only relies on range-analysis #8372

Conversation

yangshuxin commented Oct 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yangshuxin commented Oct 7, 2025

Uh oh!

njriasan left a comment

Choose a reason for hiding this comment

Uh oh!

njriasan Oct 9, 2025

Choose a reason for hiding this comment

Uh oh!

njriasan Oct 9, 2025

Choose a reason for hiding this comment

Uh oh!

njriasan Oct 9, 2025

Choose a reason for hiding this comment

Uh oh!

yangshuxin Oct 9, 2025

Choose a reason for hiding this comment

Uh oh!

yangshuxin Oct 11, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

antiagainst left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

yangshuxin commented Oct 5, 2025 •

edited

Loading