-
Notifications
You must be signed in to change notification settings - Fork 14.7k
[LV] Convert gather loads with invariant stride into strided loads #147297
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This is a port of the approach used in RISCVGatherScatterLowering, implemented entirely in VPlan. If you'd prefer to switch to LLVM IR–based analysis using SCEV, please let me know. That said, the current patch faces an awkward situation: all existing strided access patterns in our lit tests are currently not converted to strided accesses, because the cost returned by TTI.getStridedMemoryOpCost is higher than that of a gather. As a result, I'm marking this patch as Draft for now, until we have test cases that can demonstrate the intended functionality. |
✅ With the latest revision this PR passed the C/C++ code formatter. |
+1 on this approach of doing at as VPlan transformation. That way we don't need to work with the legacy cost model. I presume we will need to handle VPWidenStridedLoadRecipes in planContainsAdditionalSimplifcations and skip the assertion?
Yeah I think the current RISCVTTIImpl::getStridedMemoryOpCost is too expensive. It should definitely be at least cheaper than RISCVTTIImpl::getGatherScatterOpCost, and at the moment the cost model seems to return the same for them. I know the spacemit-x60 doesn't have fast vlse but I don't think they're as slow as vluxei. Do we have benchmark results for these? cc) @mikhailramalho |
I have some data on C908 for strided load (which is weird): camel-cdr/rvv-bench#12. The result may hold for spacemit-x60. |
Another approach is using SCEV to get the base and stride. Both can be transformed within VPlanTransform without relying on the legacy cost model.
I’ve previously forced the cost model to always return profitable in order to trigger the transformation, and didn’t encounter any assertions. I plan to use the same method to test against the llvm-test-suite and see if any issues come up. |
@lukel97 @wangpc-pp |
|
||
auto *LoadR = cast<VPWidenLoadRecipe>(MemR); | ||
auto *StridedLoad = new VPWidenStridedLoadRecipe( | ||
*cast<LoadInst>(&Ingredient), NewPtr, Stride, &Plan.getVF(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not very familiar with the details of RISC-V or the experimental_vp_strided_load
intrinsic, so my apologies if my understanding is incorrect.
My understanding is that the Stride
is already in bytes. However, it seems to be multiplied to be in bytes in VPWidenStridedLoadRecipe::execute
. Is this intentional?
This patch detects stride memory accesses such as:
and converts widen non-consecutive loads (i.e. gathers) into strided loads when profitable and legal.
The transformation is implemented as part of VPlan and is inspired by existing logic in RISCVGatherScatterLowering. Some of the legality and analysis logic has been moved into VPlan to enable this conversion earlier during vectorization planning.
This enables more efficient code generation for targets like RISC-V that support strided loads natively.