[LV] Convert gather loads with invariant stride into strided loads #147297

Mel-Chen · 2025-07-07T13:29:42Z

This patch detects stride memory accesses such as:

  void stride(int* a, int *b, int n) {
    for (int i = 0; i < n; i++)
      a[i * 5] = b[i * 5] + i;
  }

and converts widen non-consecutive loads (i.e. gathers) into strided loads when profitable and legal.
The transformation is implemented as part of VPlan and is inspired by existing logic in RISCVGatherScatterLowering. Some of the legality and analysis logic has been moved into VPlan to enable this conversion earlier during vectorization planning.

This enables more efficient code generation for targets like RISC-V that support strided loads natively.

Mel-Chen · 2025-07-07T13:30:06Z

This is a port of the approach used in RISCVGatherScatterLowering, implemented entirely in VPlan. If you'd prefer to switch to LLVM IR–based analysis using SCEV, please let me know.

That said, the current patch faces an awkward situation: all existing strided access patterns in our lit tests are currently not converted to strided accesses, because the cost returned by TTI.getStridedMemoryOpCost is higher than that of a gather.

As a result, I'm marking this patch as Draft for now, until we have test cases that can demonstrate the intended functionality.

github-actions · 2025-07-07T13:32:52Z

✅ With the latest revision this PR passed the C/C++ code formatter.

lukel97 · 2025-07-08T10:14:18Z

+1 on this approach of doing at as VPlan transformation. That way we don't need to work with the legacy cost model.

I presume we will need to handle VPWidenStridedLoadRecipes in planContainsAdditionalSimplifcations and skip the assertion?

That said, the current patch faces an awkward situation: all existing strided access patterns in our lit tests are currently not converted to strided accesses, because the cost returned by TTI.getStridedMemoryOpCost is higher than that of a gather.

Yeah I think the current RISCVTTIImpl::getStridedMemoryOpCost is too expensive. It should definitely be at least cheaper than RISCVTTIImpl::getGatherScatterOpCost, and at the moment the cost model seems to return the same for them.

I know the spacemit-x60 doesn't have fast vlse but I don't think they're as slow as vluxei. Do we have benchmark results for these? cc) @mikhailramalho

wangpc-pp · 2025-07-08T12:44:09Z

I know the spacemit-x60 doesn't have fast vlse but I don't think they're as slow as vluxei. Do we have benchmark results for these? cc) @mikhailramalho

I have some data on C908 for strided load (which is weird): camel-cdr/rvv-bench#12. The result may hold for spacemit-x60.

Mel-Chen · 2025-07-22T08:45:26Z

+1 on this approach of doing at as VPlan transformation. That way we don't need to work with the legacy cost model.

Another approach is using SCEV to get the base and stride. Both can be transformed within VPlanTransform without relying on the legacy cost model.

I presume we will need to handle VPWidenStridedLoadRecipes in planContainsAdditionalSimplifcations and skip the assertion?

I’ve previously forced the cost model to always return profitable in order to trigger the transformation, and didn’t encounter any assertions. I plan to use the same method to test against the llvm-test-suite and see if any issues come up.

Mel-Chen · 2025-07-22T08:49:51Z

@lukel97 @wangpc-pp
#149955
Elvis is working on improving the gather/scatter cost model. This should help enable strided accesses.

kinoshita-fj · 2025-07-24T08:49:05Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

+
+      auto *LoadR = cast<VPWidenLoadRecipe>(MemR);
+      auto *StridedLoad = new VPWidenStridedLoadRecipe(
+          *cast<LoadInst>(&Ingredient), NewPtr, Stride, &Plan.getVF(),


I'm not very familiar with the details of RISC-V or the experimental_vp_strided_load intrinsic, so my apologies if my understanding is incorrect.
My understanding is that the Stride is already in bytes. However, it seems to be multiplied to be in bytes in VPWidenStridedLoadRecipe::execute. Is this intentional?

Mel-Chen requested review from fhahn, lukel97, arcbbb, alexey-bataev, LiqinWeng, ayalz, ElvisWang123 and wangpc-pp July 7, 2025 13:29

Mel-Chen added 3 commits July 7, 2025 06:37

New VPWidenStridedLoadRecipe

f3c023f

Expand VPVectorPointerRecipe to support stride

3c1985e

Transform the gather to stride load

72d328d

Mel-Chen force-pushed the const-stride branch from 16598e6 to 72d328d Compare July 7, 2025 13:37

kinoshita-fj reviewed Jul 24, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[LV] Convert gather loads with invariant stride into strided loads #147297

[LV] Convert gather loads with invariant stride into strided loads #147297

Uh oh!

Mel-Chen commented Jul 7, 2025

Uh oh!

Mel-Chen commented Jul 7, 2025

Uh oh!

github-actions bot commented Jul 7, 2025 •

edited

Loading

Uh oh!

lukel97 commented Jul 8, 2025

Uh oh!

wangpc-pp commented Jul 8, 2025

Uh oh!

Mel-Chen commented Jul 22, 2025

Uh oh!

Mel-Chen commented Jul 22, 2025 •

edited

Loading

Uh oh!

kinoshita-fj Jul 24, 2025

Uh oh!

Uh oh!

[LV] Convert gather loads with invariant stride into strided loads #147297

Are you sure you want to change the base?

[LV] Convert gather loads with invariant stride into strided loads #147297

Uh oh!

Conversation

Mel-Chen commented Jul 7, 2025

Uh oh!

Mel-Chen commented Jul 7, 2025

Uh oh!

github-actions bot commented Jul 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lukel97 commented Jul 8, 2025

Uh oh!

wangpc-pp commented Jul 8, 2025

Uh oh!

Mel-Chen commented Jul 22, 2025

Uh oh!

Mel-Chen commented Jul 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kinoshita-fj Jul 24, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions bot commented Jul 7, 2025 •

edited

Loading

Mel-Chen commented Jul 22, 2025 •

edited

Loading