Skip to content

Conversation

@wks
Copy link
Collaborator

@wks wks commented Nov 3, 2025

We fix a bug when enumerating objects in LOS. During Prepare of ConcurrentImmix, we flip the from and to space of the TreadMill, and then enumerate objects in the LOS to set their unmark bits. However, the object enumeration code only enumerates objects in the to-space. Because we have flipped all objects to the from-space, object enumeration visits no objects at all. Consequently, no objects in LOS have unlog bits during concurrent marking, and the SATB barriers will not be applied to any objects in the LOS.

Now TreadMill::enumerate_objects take an additional argument all: bool. When it is set to true, we enumerate all objects in the Treadmill (which are all objects in the LOS). We do this when setting and clearing unlog bits, and fix the bug.

We also refactored the TreadMill so that all of its HashSet fields are protected by one single Mutex, because it is not safe to enumerate one HashSet while another thread modifies another.

Fixes: mmtk/mmtk-openjdk#334

wks added 2 commits November 3, 2025 17:21
We fix a bug when enumerating objects in LOS.  During Prepare of
ConcurrentImmix, we flip the from and to space of the TreadMill, and
then enumerate objects in the LOS to set their unmark bits.  However,
the object enumeration code only enumerates objects in the to-space.
Because we have flipped all objects to the from-space, object
enumeration visited no objects at all.  Consequently, no objects in LOS
have unlog bits during concurrent marking, and the SATB barriers will
not be applied to any objects in the LOS.

Now `TreadMill::enumerate_objects` take an additional argument `gc`.
When it is set to true, we enumerate objects in both from-spaces and
to-spaces.  We do this when setting and clearing unlog bits, and fix the
bug.

We also refactored the `TreadMill` so that all of its `HashSet` fields
are protected by one single `Mutex`, because it is not safe to enumerate
one `HashSet` while another thread modifies another.

We also renamed the hash sets to `nursery.from_space`,
`nursery.to_space`, `mature.from_space` and `mature.to_space`,
respectively, for consistency.

/// A pair of from and two spaces.
#[derive(Default)]
struct FromToSpace {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we rename this to TreadMillSpacePair or something? The FromToSpace name is a bit confusing.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we rename this to TreadMillSpacePair or something? The FromToSpace name is a bit confusing.

I changed it to just SpacePair because it is private in the treadmill module.

/// The synchronized part of [`TreadMill`]
#[derive(Default)]
struct TreadMillSync {
nursery: SpacePair,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We normally think nursery is the from space and mature is the to space. This code now uses two 'logical spaces' (from space and to space) for both nursery and mature -- that is 4 logical spaces. I don't know how that makes sense.

Besides, collect_nursery is supposed to hold objects that are allocated during a GC (e.g. an over-sized object is copied to LOS from GenImmix nursery). However, we don't do that any more. Large objects are allocated directly into LOS, and there is no copy from other spaces to LOS. I think you can assert if that is true. If it is true, then the code becomes nursery + from + to which is similar to GenCopy -- that makes much more sense.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It turns out that the original 'treadmill' collector does have 4 spaces: https://cofault.com/treadmill.html. We can move to something closer to that. It is still fine to use 4 sets, instead of 1 doubly linked list.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW I don't think we have to do any major refactoring to TreadMill to support object enumeration. But if you do plan for any major refactoring (such as [nursery, mature] x [from, to]), I don't think it is a good idea to move further away from the original 'treadmill' concept.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are two dimensions.

  1. Whether the object is newly allocated after the previous GC (young) or before the previous GC (old).
  2. Whether the object is determined to be live (to) or has unknown liveness (from).

But what you said also makes sense because during GC, we are always going to evacuate the nursery, so it is always a from-space during GC. If we don't allocate any large objects during GC and label it as young (which makes no sense because even over-sized objects moved from ImmixSpace are old by definition after this GC) , the "nursery to-space" will always be empty during GC, too. So it makes sense to only keep three space, i.e. nursery + from + to.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW I don't think we have to do any major refactoring to TreadMill to support object enumeration. But if you do plan for any major refactoring (such as [nursery, mature] x [from, to]), I don't think it is a good idea to move further away from the original 'treadmill' concept.

I don't plan to do a major refactoring for now. I just needed to make the mutex more coarse-grained which requires some structural change. I ended up using just nursery + from + to which should be the simplest we can have to support our current LOS usage.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wait. There is one catch. If we enumerate objects during mutator time, we should enumerate both nursery and to_space because the nursery can contain newly allocated objects which should be returned, too. But the "nursery from-space" should be empty. Because it is only an assertion and it has undefined behavior if the mutator attempts to enumerate objects during GC, I'll reduce the precision of the assertion for now.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kept the nursery + from + to structure, but reduced the amount of assertion.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It turns out that the original 'treadmill' collector does have 4 spaces: https://cofault.com/treadmill.html. We can move to something closer to that. It is still fine to use 4 sets, instead of 1 doubly linked list.

After reading that paper, I found that the Treadmill in JikesRVM MMTk (and therefore the Rust MMTk) is completely unrelated to the Treadmill algorithm described in that paper.

The four colors in Baker's Treadmill algorithm contains four pointers separating four cell colors, namely ecru, grey, black and white. White cells are unallocated; ecru objects are allocated but not enqueued; grey objects are enqueued but not marked; and black objects are marked. This organization makes it easy to free garbage objects in bulk by moving a few pointers so that ecru (un-enqueued objects) objects become white (free) and black (marked) objects become ecru (un-enqueued). The Treadmill algorithm is not generational.

But the Treadmill data structure in JikesRVM MMTk keeps four lists of objects: allocNursery, collectNursery, toSpace and fromSpace. They match the [nursery, mature] X [from, to] semantics I mentioned above, and they are used as such.

I dug a bit into the history of JikesRVM.

  • The Treadmill was initially just DoublyLinkedList, where each node contains three pointers: the forward link, the backward link, and a pointer to the "treadmill" which is the DoublyLinkedList instance itself.
  • The DoublyLinkedList type is immediately renamed to Treadmill in the next commit, and it then contained two separate doubly linked lists, one for the from-space, and the other for the to-space. The third word of a node now points to the Treadmill instance instead of the DoublyLinkedList instance it is in.
  • Several years later, a third linked list "nursery" was added.
  • Half a year later, the third word (the pointer to the "treadmill") is removed.
  • And several months later, the nursery is split into the "collection nursery" and the "allocation nursery" for the purpose of "in preparation for concurrent/incremental collection".

From this history, we see that the Treadmill in JikesRVM MMTk was never related to Baker's Treadmill algorithm. It is named as "treadmill" because it maintains cyclic linked lists which resemble the tread of a treadmill in the real world.

And the separation of "allocation nursery" and "collection nursery" had a purpose, i.e. "concurrent/incremental collection". We haven't implemented concurrent or incremental plans before, so we see a single nursery is sufficient. But since we have just started to implement ConcurrentImmix, we may eventually find cases where we may need two nurseries.

Given this, I'll revert my refactoring on the naming and structure of the Treadmill type (except for making the mutex coarser) and keep the names we inherited from the JikesRVM MMTk. We may do a refactoring on Treadmill in the future, maybe renaming it so that we don't confuse it with Baker's Treadmill algorithm.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I remember this discussion on how the treadmill we've implemented is very different from the actual one: #517

wks added 6 commits November 4, 2025 14:53
We now have only nursery + from-space + to-space.
Allow controlling whether `TreadMill::enumerate_objects` visits the
nursery and the from-space separately.  Mutator visits nursery +
to_space; Compressor visits only to_space; ConcurrentImmix visits all
spaces.  It off-loads the assertion to its caller.
We revert back to "alloc_nursery" and "collect_nursery", which were the
names used back in the JikesRVM MMTk.
@wks
Copy link
Collaborator Author

wks commented Nov 4, 2025

I reverted the naming changes, keeping the names "alloc_nursery" and "collect_nursery" as before.

@wks wks changed the title Fix LOS object enumeration and refactor TreadMill Fix LOS object enumeration Nov 4, 2025
@wks wks added the PR-extended-testing Run extended tests for the pull request label Nov 4, 2025
}

/// Enumerate objects for forwarding references. Used by certain mark-compact plans.
pub(crate) fn enumerate_objects_for_forwarding(&self, enumerator: &mut dyn ObjectEnumerator) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest avoid using the term 'forwarding' here. LOS does not know what forwarding means -- it depends on the plan, like when the function will be called, and what object it should list, etc.

It should be the plan that instructs LOS to iterate objects in to-space and to assert there is no nursery object (it is a hack that currently the compressor space does that instead of the plan).

wks added 2 commits November 5, 2025 10:30
We no longer mention "forwarding" in the function name, but we add a
comment saying that it is a workaround for Compressor.
@wks
Copy link
Collaborator Author

wks commented Nov 5, 2025

The JikesRVM test is stuck for 5 hours during lusearch. I'll restart it.

Copy link
Member

@qinsoon qinsoon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@wks wks added this pull request to the merge queue Nov 6, 2025
Merged via the queue into mmtk:master with commit e25ad8b Nov 6, 2025
36 of 39 checks passed
@wks wks deleted the fix/los-unlog branch November 6, 2025 03:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

PR-extended-testing Run extended tests for the pull request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Segmentation fault with ConcurrentImmix and h2o

3 participants