Skip to content

Commit 025d0c0

Browse files
authored
(reland) [AMDGPU][SIInsertWaitCnts] Use RegUnits-based tracking (#162077) (#171779)
Fixed a crash in Blender due to some weird control flow. The issue was with the "merge" function which was only looking at the keys of the "Other" VMem/SGPR maps. It needs to look at the keys of both maps and merge them. Original commit message below ---- The pass was already "reinventing" the concept just to deal with 16 bit registers. Clean up the entire tracking logic to only use register units. There are no test changes because functionality didn't change, except: - We can now track more LDS DMA IDs if we need it (up to `1 << 16`) - The debug prints also changed a bit because we now talk in terms of register units. This also changes the tracking to use a DenseMap instead of a massive fixed size table. This trades a bit of access speed for a smaller memory footprint. Allocating and memsetting a huge table to zero caused a non-negligible performance impact (I've observed up to 50% of the time in the pass spent in the `memcpy` built-in on a big test file). I also think we don't access these often enough to really justify using a vector. We do a few accesses per instruction, but not much more. In a huge 120MB LL file, I can barely see the trace of the DenseMap accesses.
1 parent b492b35 commit 025d0c0

File tree

3 files changed

+495
-286
lines changed

3 files changed

+495
-286
lines changed

0 commit comments

Comments
 (0)