Aarch64 simd #65

Dr-Emann · 2025-05-21T18:38:21Z

Builds on #57

Closes #61

Currently still working, but opening as a draft for early looks

Remove unneeded `extern crate`s

…ros only

Also, add some benchmarks against memchr

Add a test that looks for the first item in a long haystack

The memmap crate is unmaintained, instead, use the maintained memmap2 crate

Structs don't need the bounds, only the implementations

Mostly just adding #[must_use]

This speeds up the criteron benchmarks by almost 2x I believe this is needed because e.g. Bytes::find is inlined, and calls `find` generically, which will call PackedCompareControl methods. So the code calling the methods will be inlined into the calling crate, but the implemetations of the PackedCompareControl are not accessable to the code in the calling crate, so they will end up as actual function calls. However these functions are _super_ simple, and inlining them helps a LOT, so adding `#[inline]` to these functions, and making their implementation available to calling crates has a huge effect. This was only seen when moving to criterion because previously, nightly benchmarks were implemented in the library crate itself, and so these functions were already elegable for inlining. Criteron results were actually more accurate to what callers of the crate would actually see!

@BurntSushi

Per suggestion from @BurntSushi [here](tafia/quick-xml#664 (comment)) On my M1, tt appears to be slower but competitive with memchr up to memchr3, then start being the from 5-16

We may not want to be stuck with const-constructable implementations

Move the simd-only tests to the top level This allows testing even when sse4.2 isn't enabled: when it is available, it will still test the simd implementation, but will test the fallback otherwise.

This moves mentions of "simd" to be x86 specific. Also, do everything with #[cfg], rather than requiring custom cfgs populated in the build.rs

This includes pretty frequent instances

For aarch64, we can do quite a bit better than just calling the `find` function repeatedly: we build a bitset of 64 bits where we've already found if they match the set of bits we're looking for. We can then efficently iterate over those set bits. It may be possible to do something similar in the x86 simd implementation.

Dr-Emann · 2025-05-21T19:42:19Z

rust-lang/rust#127481

Looks like the unstable Pattern api changed

kivikakk · 2025-10-19T03:11:57Z

Fwiw, I gave this a go in a hot loop in Comrak, and it works really well!

Dr-Emann and others added 21 commits October 17, 2023 21:33

Update to edition 2018

95b2281

Remove unneeded `extern crate`s

Make functions const

43961a5

Allow using static with the output of the bytes/ascii_chars mac…

d3bda48

…ros only

Port benchmarks to stable criterion

6bf8735

Also, add some benchmarks against memchr

Add a benchmark for a full 16 chars

25e62c7

Add benchmark for worst case for memchr

0c42609

Add a test that looks for the first item in a long haystack

Use memmap2

d85b235

The memmap crate is unmaintained, instead, use the maintained memmap2 crate

Remove bounds on struct definitions

0b1259d

Structs don't need the bounds, only the implementations

Update changelog

f7fd429

Apply clippy suggesions

8dc6fca

Mostly just adding #[must_use]

Add the "teddy" algorithm from aho-corasick

0cf5a36

Per suggestion from @BurntSushi [here](tafia/quick-xml#664 (comment)) On my M1, tt appears to be slower but competitive with memchr up to memchr3, then start being the from 5-16

Fix CI by setting up target configuration for ASAN

08ff586

Don't make things const

434c045

We may not want to be stuck with const-constructable implementations

configure unexpected_cfgs lint

be8622e

remove some extra benchmarks that don't make sense

d48378d

Add top level tests

be3c68f

Move the simd-only tests to the top level This allows testing even when sse4.2 isn't enabled: when it is available, it will still test the simd implementation, but will test the fallback otherwise.

Add aarch64 neon simd

cf7a8f0

This moves mentions of "simd" to be x86 specific. Also, do everything with #[cfg], rather than requiring custom cfgs populated in the build.rs

add benchmarks iterating over all positions of the items

3b7081f

This includes pretty frequent instances

add overall algorithm comments and links

be472e1

kivikakk mentioned this pull request Oct 19, 2025

Use jetscii for SIMD searching. kivikakk/comrak#630

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Aarch64 simd #65

Aarch64 simd #65

Uh oh!

Dr-Emann commented May 21, 2025 •

edited

Loading

Uh oh!

Dr-Emann commented May 21, 2025

Uh oh!

kivikakk commented Oct 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Aarch64 simd #65

Are you sure you want to change the base?

Aarch64 simd #65

Uh oh!

Conversation

Dr-Emann commented May 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Dr-Emann commented May 21, 2025

Uh oh!

kivikakk commented Oct 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Dr-Emann commented May 21, 2025 •

edited

Loading