[LoadStoreOpToLLVM] Transpose 2d load. #4870

chengjunlu · 2025-08-11T06:51:06Z

To use the transpose 2d block io to load column major matrix from global memory. (The column major matrix here could be generalized to the cases that register layout fast change dim is not same as the fast change dim on global memory.)

The transposing operation is a recursive operation:

To use the transpose 2D block IO to load column major matrix on Xe+:

To load the matrix as d32 type matrix from memory with transposed in register.
To transpose the 1xNxd32 to (32/m)xNxdm with the bitcast operation.

The code is only implemented for functionality for the layouts with limitations.
It is not best efficient for now.

Copilot

Pull Request Overview

This draft PR implements transpose 2D block load functionality to efficiently load column major matrices from global memory on Intel Xe+ GPUs. The implementation introduces a transpose operation when the register layout's fast-changing dimension differs from the memory layout, using d32 type matrices with bitcast operations for the transformation.

Added support for transpose 2D block IO operations with transpose parameter
Enhanced block IO tile size calculation to handle transpose scenarios
Implemented new test coverage for transpose and column major load operations

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 8 comments.

File	Description
LoadStoreOpToLLVM.cpp	Major refactoring of 2D block load implementation to support transpose operations and simplified layout handling
tensor-pointer-load-block-2d.mlir	Updated test expectations for new block load configurations and tile sizes
test_block_store.py	Added transpose parameter and column major test cases for block operations

third_party/intel/lib/TritonIntelGPUToLLVM/LoadStoreOpToLLVM.cpp

python/test/unit/intel/test_block_io.py

python/test/unit/intel/test_block_store.py

chengjunlu · 2025-11-10T05:40:10Z

third_party/intel/lib/TritonIntelGPUToLLVM/LoadStoreOpToLLVM.cpp

+      packedElemSizeInBits = 32;
+      numPackedVals = packedElemSizeInBits / elemSizeInBits;
+
+      // Improve this. The current 2D block load only transposes the matrix at


The improvements will be added in another PR to minimal the changes in a single PR.

Copilot

Pull Request Overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

third_party/intel/lib/TritonIntelGPUToLLVM/LoadStoreOpToLLVM.cpp

python/test/unit/intel/test_block_io.py

chengjunlu · 2025-11-10T05:42:16Z

@whitneywhtsang @etiotto , The transpose loading is ready for review.

Signed-off-by: Lu,Chengjun <chengjun.lu@intel.com>

whitneywhtsang · 2025-11-13T05:30:51Z

Can you fix the typo in the image of the PR description or remove it?

whitneywhtsang · 2025-11-13T05:35:38Z

third_party/intel/lib/TritonIntelGPUToLLVM/LoadStoreOpToLLVM.cpp

-    return axisInfo ? axisInfo->getStride(dim) : -1;
+    if (axisInfo) {
+      const SmallVector<int64_t> &stride = axisInfo->getStride();
+      if (dim < stride.size()) {


why would we call getStride with dim more than the size of stride?

chengjunlu requested review from Copilot, etiotto and whitneywhtsang August 11, 2025 06:51

chengjunlu mentioned this pull request Aug 11, 2025

[EXPERIMENTAL]: Load column major matrix with 2d block io #4604

Closed

Copilot AI reviewed Aug 11, 2025

View reviewed changes

chengjunlu linked an issue Aug 11, 2025 that may be closed by this pull request

[06-fused-attention] Determine if FP8 operand B can use 2d block load #3572

Open

chengjunlu force-pushed the chengjun/trans_2d_load branch from efff84d to 55c896e Compare August 11, 2025 07:42

etiotto marked this pull request as draft October 9, 2025 14:09

chengjunlu force-pushed the chengjun/trans_2d_load branch from 20a1637 to 942ca37 Compare November 4, 2025 04:49

chengjunlu changed the title ~~[Draft] Transpose 2d load.~~ [LoadStoreOpToLLVM] Transpose 2d load. Nov 4, 2025

chengjunlu marked this pull request as ready for review November 4, 2025 04:50

chengjunlu force-pushed the chengjun/trans_2d_load branch 7 times, most recently from 210886e to e979428 Compare November 10, 2025 05:37

chengjunlu commented Nov 10, 2025

View reviewed changes

chengjunlu requested a review from Copilot November 10, 2025 05:41

Copilot AI reviewed Nov 10, 2025

View reviewed changes

third_party/intel/lib/TritonIntelGPUToLLVM/LoadStoreOpToLLVM.cpp Show resolved Hide resolved

python/test/unit/intel/test_block_io.py Outdated Show resolved Hide resolved

chengjunlu force-pushed the chengjun/trans_2d_load branch from e979428 to 248ae4c Compare November 12, 2025 03:00

[LoadStoreOpToLLVM] Transposed 2d load.

248ae4c

Signed-off-by: Lu,Chengjun <chengjun.lu@intel.com>

whitneywhtsang reviewed Nov 13, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[LoadStoreOpToLLVM] Transpose 2d load. #4870

[LoadStoreOpToLLVM] Transpose 2d load. #4870

chengjunlu commented Aug 11, 2025 •

edited by etiotto

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chengjunlu Nov 10, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

chengjunlu commented Nov 10, 2025

Uh oh!

whitneywhtsang commented Nov 13, 2025

Uh oh!

whitneywhtsang Nov 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[LoadStoreOpToLLVM] Transpose 2d load. #4870

Are you sure you want to change the base?

[LoadStoreOpToLLVM] Transpose 2d load. #4870

Conversation

chengjunlu commented Aug 11, 2025 • edited by etiotto Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chengjunlu Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Uh oh!

Uh oh!

chengjunlu commented Nov 10, 2025

Uh oh!

whitneywhtsang commented Nov 13, 2025

Uh oh!

whitneywhtsang Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

chengjunlu commented Aug 11, 2025 •

edited by etiotto

Loading