Skip to content

Conversation

@sgued
Copy link
Contributor

@sgued sgued commented Oct 8, 2025

This patch documents the issue reported in #583 and adds failing loom tests to exercise #583 hoping that a future algorithm change can make them pass.

The tests are ran in CI but their results is ignored for now as they fail.

Ideally the loom tests will cover more usage of mpmc::Queue in the future and also cover spsc.


#[cfg(not(any(feature = "portable-atomic", loom)))]
use core::sync::atomic;
#[cfg(feature = "portable-atomic")]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#[cfg(feature = "portable-atomic")]
#[cfg(all(feature = "portable-atomic", not(loom)))]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it ever makes sense to have loom and portable-atomic running.
Maybe we can add a compile_error! that prevents any use of --cfg loom outside of the expected feature combination.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whichever you prefer, I just added the change here since you added the conditional compilation above for core::sync::atomic. Without the conditional compilation selector, or the compile_error it's possible to combine both (though it result in a different error about a name conflict).

@0xllx0
Copy link

0xllx0 commented Oct 25, 2025

I think loom tests are a bad choice for testing concurrency issues. Not only does it require a large amount of refactors to the code, note all of the doubled functions to deal with non-const initialization, it also puts artificial constraints on "branches" in the code.

loom is a cool project, and I think it is an important tool. After working with it, I think it needs more time to develop.

For instance, I used a real-time scheduler (via thread-priority) to test my wCQ implementation. The same loom tests that panic because of the "too many branches" error, pass fine under the real-time scheduler.

@sgued
Copy link
Contributor Author

sgued commented Oct 25, 2025

Thanks for your work on this issue.

If you can't get loom to work because there are too many branches even in a correct implementation that doesn't spin it might make sense to drop it.

Regarding WCQ, it appears to be a lot more code, probably a measurable increase in the compiled binary, and a radical change in the API (using a limited set of handles and a second const-generics instead of being able to dequeue/enqueue simply with a reference).
It might be worth bringing it in as a subject in a wg meeting to see were we want to go (I have no personal use for MPMC so I don't know if this API changes can be a deal-breaker for some users).

@0xllx0
Copy link

0xllx0 commented Oct 25, 2025

Thanks for your work on this issue.

No worries, thanks for your continued review/feedback :)

If you can't get loom to work because there are too many branches even in a correct implementation that doesn't spin it might make sense to drop it.

Yeah, I think we're running into a common corner case, as there a few issues on their project related to the spinlock error.

Tried the suggestion from one of the issue replies, recommending to use the loom::sync::yield_now method in each of the loops. Nothing worked :(

Even boosting LOOM_MAX_BRANCHES to u16::MAX (the max value), still threw the same error.

Regarding WCQ, it appears to be a lot more code, probably a measurable increase in the compiled binary, and a radical change in the API (using a limited set of handles and a second const-generics instead of being able to dequeue/enqueue simply with a reference).

Completely agree, those were also my biggest concerns. The algorithm is very different, and the code size is at least 300% of the current implementation. Some of that can probably be trimmed down, but even the original C implementation is around 1k LoC.

I also implemented the nblfq algorithm which is much closer to the current implementation in both algorithm and code size.

After adding the same tests to see if it passes the real-time scheduler tests, nblfq would probably be a better solution (also much easier to maintain).

It would also mean less review, because it is basically just a bounded retry loop around the main enqueue/dequeue logic. The code diff will be much smaller.

It might be worth bringing it in as a subject in a wg meeting to see were we want to go (I have no personal use for MPMC so I don't know if this API changes can be a deal-breaker for some users).

I would be happy to talk about it in the next WG meeting.

Personally, I think the wcq algorithm would be a better fit for an mpmc::channel implementation, where the clone logic for each tx/rx channel more closely maps to the RingState concept.

However, if the nblfq algorithm passes all the necessary tests, I think we could adopt that to an mpmc::channel implementation, too.

The main advantage, as I understand, is that the wcq algorithm provides wait-free guarantees that are stronger than the lock-free guarantees of both nblfq and the current implementation. That advantage may not be worth the listed disadvantages, for our use-case.

@sgued sgued force-pushed the loom-tests branch 2 times, most recently from a9ecc6f to 4e79620 Compare October 28, 2025 17:59
This patch documents rust-embedded#583 adds failing loom tests to exercise rust-embedded#583
hoping that a future algorithm change can make them pass.

The tests are ran in CI but their results is ignored for now as they fail.

Ideally the `loom` tests will cover more usage of `mpmc::Queue` in the future
and also cover `spsc`.
Copy link
Contributor

@zeenix zeenix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the loom tests don't currently pass, I'd split their addition in a separate PR and not merge them until they're fixed (I don't see a reason for tests that don't yet pass).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants