Commit 8609010
authored
Fix float4 tests cases in
`emitTransferBetweenRegistersAndShared` creates a very long vector for
load:
`%18448 = llvm.load %18447 {alignment = 32768 : i64} : !llvm.ptr<3> ->
vector<16384xbf16> loc(#loc40)`.
`emitTransferBetweenRegistersAndShared` function has `maxVecElems`
option (by default as `std::nullopt`) and we can limit the size of a
vector to, say, 256 elements, since it is hard to imagine that larger
vectors can work efficiently.
`TRITON_ALWAYS_COMPILE=1 MLIR_ENABLE_TIMING=1 LLVM_ENABLE_TIMING=1
python -m pytest
python/test/unit/intel/test_mxfp_matmul.py::test_mxfp_matmul[True-True-float4-float4-True-True-1-128-128-128-1024-512-512]
--device=xpu -s` takes around 35 secs now.
The biggest part now is ` 19.5668 ( 41.2%) 19.5668 ( 76.5%)
Canonicalizer`.
---------
Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>test_mxfp_matmul (#4776)1 parent 8feef60 commit 8609010
File tree
2 files changed
+8
-5
lines changed- lib/Conversion/TritonGPUToLLVM
- python/test/unit/intel
2 files changed
+8
-5
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
930 | 930 | | |
931 | 931 | | |
932 | 932 | | |
933 | | - | |
934 | | - | |
| 933 | + | |
| 934 | + | |
935 | 935 | | |
936 | 936 | | |
937 | 937 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
107 | 107 | | |
108 | 108 | | |
109 | 109 | | |
110 | | - | |
111 | | - | |
112 | 110 | | |
113 | 111 | | |
114 | 112 | | |
| |||
179 | 177 | | |
180 | 178 | | |
181 | 179 | | |
182 | | - | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
0 commit comments