-
Notifications
You must be signed in to change notification settings - Fork 91
Description
I've been tinkering with Simd::gather_or_default()
and noticed that it optimizes far worse than I'd like and it turned out to be caused by bounds checks.
I just happen to know that due to the logic of the code, it can never ever go out of bounds, but compiler can't reason that far. As you can see, assembly is quite compact and does use vpgatherdd
as I expected. Now change if true
to if false
to make it use Simd::gather_or_default()
. There is much more code now and it performs measurably worse too.
So I'd like to request something like Simd::gather_unchecked()
.
I see there are a bunch of gather
variants, but they all support a lot of features and generate worse code as the result. For example, Simd::gather_select_unchecked()
is unable to optimize-out Mask::splat(true)
. If it did, I think it would be pretty much what I need, but I don't know how realistic it is to expect such an optimization.