[AMD] Support TDM store on gfx1250 #8392

borontion · 2025-10-07T21:10:47Z

This PR adds support for TDM store on gfx1250, following #8333. Exposes tdm.async_load through Gluon. Groups common TDM utilities for load/store.

third_party/amd/include/Dialect/TritonAMDGPU/IR/TritonAMDGPUOps.td

ThomasRaoux · 2025-10-07T23:33:57Z

third_party/amd/lib/TritonAMDGPUToLLVM/LoadStoreOpToLLVM.cpp

+    auto [group0, group1] = LLVM::AMD::createTDMDescriptor(
+        rewriter, loc, getTypeConverter(), elementType, blockShape, tensorShape,
+        tensorStride, offset, srcPtr, dstPtr, op.getPred(), numWraps,
+        padInterval, padAmount);


why can't we do that when we create the descriptor instead of doing it on the fly?

We need to know the shared memory destination and offset from the load/store op to create a complete tdm descriptor.

It is possible to create a partially filled tdm descriptor when lowering the create tensor descriptor op and then update it later for load/store, but this turns out still require some effort (specifically we need to extract and update global memory pointer, shared memory pointer and tensor shape

the tensor shape and smem layout should be known at descriptor creation time. The problem is that the offsets need to be combined with the base global address?
I think separating the invariant part of the descriptor will be important to make sure we don't create code sequence accidentally loop dependent.
Also this solution will implicitly rely on LICM and CSE cleaning things up, I think this is a bit risky as in some cases we need to disable LICM to avoid register pressure

Yeah it does depend on LICM to clean up. I can switch to do it in the make tensor descriptor if so. @antiagainst Do you have other thoughts?

is there is no downsides I think it would be better. It would also reduce liveranges right? as I assume the descriptor is more packed than the set of values?

Yeah what Thomas said makes sense to me. And expect it to provide better control and generate better code too.

third_party/amd/lib/TritonAMDGPUToLLVM/LoadStoreOpToLLVM.cpp

peterbell10

Gluon changes LGTM

antiagainst

LGTM now. @ThomasRaoux can you take another look?

antiagainst · 2025-10-09T02:10:55Z

third_party/amd/lib/TritonAMDGPUToLLVM/TDMUtility.cpp

+  Type globalPtrTy = ptr_ty(ctx, 1);
+
+  Value globalAddrLow = group0[2];
+  Value globalAddrHigh = b.and_(group0[3], b.i32_val(0x7FFFFFFF));


As chatted internally, this is okay to get started for now. But we may want to define proper struct to avoid such i64 pack/unpack if LLVM is confused about it.

ThomasRaoux

LGTM

ThomasRaoux · 2025-10-09T19:19:21Z

third_party/amd/lib/TritonAMDGPUToLLVM/TDMUtility.cpp

+  return {group0, group1};
+}
+
+void fillTDMDescriptor(RewriterBase &rewriter, Location loc,


not a blocking comment but at some point I would like to understand why we need to some much arithmetic for every load/store and how expensive it is.

support tdm store

9a4fdec

borontion marked this pull request as ready for review October 7, 2025 21:36

borontion requested review from antiagainst, peterbell10, ptillet and zhanglx13 as code owners October 7, 2025 21:36

antiagainst reviewed Oct 7, 2025

View reviewed changes

third_party/amd/include/Dialect/TritonAMDGPU/IR/TritonAMDGPUOps.td Outdated Show resolved Hide resolved

ThomasRaoux reviewed Oct 7, 2025

View reviewed changes

remove read side effect

78f0f8e

guacamoleo reviewed Oct 8, 2025

View reviewed changes

third_party/amd/lib/TritonAMDGPUToLLVM/LoadStoreOpToLLVM.cpp Outdated Show resolved Hide resolved

typo

5a0847f

peterbell10 reviewed Oct 8, 2025

View reviewed changes

borontion and others added 2 commits October 8, 2025 14:47

directly create tdm descriptor

46b6ca5

update doc

668cd1d

antiagainst approved these changes Oct 9, 2025

View reviewed changes

ThomasRaoux approved these changes Oct 9, 2025

View reviewed changes

antiagainst merged commit b5fea1e into triton-lang:main Oct 9, 2025
9 checks passed

[AMD] Support TDM store on gfx1250 #8392

[AMD] Support TDM store on gfx1250 #8392

Conversation

borontion commented Oct 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

ThomasRaoux Oct 7, 2025

Choose a reason for hiding this comment

Uh oh!

borontion Oct 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ThomasRaoux Oct 8, 2025

Choose a reason for hiding this comment

Uh oh!

borontion Oct 8, 2025

Choose a reason for hiding this comment

Uh oh!

ThomasRaoux Oct 8, 2025

Choose a reason for hiding this comment

Uh oh!

antiagainst Oct 8, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

peterbell10 left a comment

Choose a reason for hiding this comment

Uh oh!

antiagainst left a comment

Choose a reason for hiding this comment

Uh oh!

antiagainst Oct 9, 2025

Choose a reason for hiding this comment

Uh oh!

ThomasRaoux left a comment

Choose a reason for hiding this comment

Uh oh!

ThomasRaoux Oct 9, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

borontion commented Oct 7, 2025 •

edited

Loading

borontion Oct 7, 2025 •

edited

Loading