-
Notifications
You must be signed in to change notification settings - Fork 78
Fix LOS object enumeration #1412
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
We fix a bug when enumerating objects in LOS. During Prepare of ConcurrentImmix, we flip the from and to space of the TreadMill, and then enumerate objects in the LOS to set their unmark bits. However, the object enumeration code only enumerates objects in the to-space. Because we have flipped all objects to the from-space, object enumeration visited no objects at all. Consequently, no objects in LOS have unlog bits during concurrent marking, and the SATB barriers will not be applied to any objects in the LOS. Now `TreadMill::enumerate_objects` take an additional argument `gc`. When it is set to true, we enumerate objects in both from-spaces and to-spaces. We do this when setting and clearing unlog bits, and fix the bug. We also refactored the `TreadMill` so that all of its `HashSet` fields are protected by one single `Mutex`, because it is not safe to enumerate one `HashSet` while another thread modifies another. We also renamed the hash sets to `nursery.from_space`, `nursery.to_space`, `mature.from_space` and `mature.to_space`, respectively, for consistency.
src/util/treadmill.rs
Outdated
|
|
||
| /// A pair of from and two spaces. | ||
| #[derive(Default)] | ||
| struct FromToSpace { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we rename this to TreadMillSpacePair or something? The FromToSpace name is a bit confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we rename this to
TreadMillSpacePairor something? TheFromToSpacename is a bit confusing.
I changed it to just SpacePair because it is private in the treadmill module.
The GC Handbook uses "fromspace" and "tospace" (no hyphen).
src/util/treadmill.rs
Outdated
| /// The synchronized part of [`TreadMill`] | ||
| #[derive(Default)] | ||
| struct TreadMillSync { | ||
| nursery: SpacePair, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We normally think nursery is the from space and mature is the to space. This code now uses two 'logical spaces' (from space and to space) for both nursery and mature -- that is 4 logical spaces. I don't know how that makes sense.
Besides, collect_nursery is supposed to hold objects that are allocated during a GC (e.g. an over-sized object is copied to LOS from GenImmix nursery). However, we don't do that any more. Large objects are allocated directly into LOS, and there is no copy from other spaces to LOS. I think you can assert if that is true. If it is true, then the code becomes nursery + from + to which is similar to GenCopy -- that makes much more sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It turns out that the original 'treadmill' collector does have 4 spaces: https://cofault.com/treadmill.html. We can move to something closer to that. It is still fine to use 4 sets, instead of 1 doubly linked list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW I don't think we have to do any major refactoring to TreadMill to support object enumeration. But if you do plan for any major refactoring (such as [nursery, mature] x [from, to]), I don't think it is a good idea to move further away from the original 'treadmill' concept.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there are two dimensions.
- Whether the object is newly allocated after the previous GC (young) or before the previous GC (old).
- Whether the object is determined to be live (to) or has unknown liveness (from).
But what you said also makes sense because during GC, we are always going to evacuate the nursery, so it is always a from-space during GC. If we don't allocate any large objects during GC and label it as young (which makes no sense because even over-sized objects moved from ImmixSpace are old by definition after this GC) , the "nursery to-space" will always be empty during GC, too. So it makes sense to only keep three space, i.e. nursery + from + to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW I don't think we have to do any major refactoring to
TreadMillto support object enumeration. But if you do plan for any major refactoring (such as[nursery, mature] x [from, to]), I don't think it is a good idea to move further away from the original 'treadmill' concept.
I don't plan to do a major refactoring for now. I just needed to make the mutex more coarse-grained which requires some structural change. I ended up using just nursery + from + to which should be the simplest we can have to support our current LOS usage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh wait. There is one catch. If we enumerate objects during mutator time, we should enumerate both nursery and to_space because the nursery can contain newly allocated objects which should be returned, too. But the "nursery from-space" should be empty. Because it is only an assertion and it has undefined behavior if the mutator attempts to enumerate objects during GC, I'll reduce the precision of the assertion for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kept the nursery + from + to structure, but reduced the amount of assertion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It turns out that the original 'treadmill' collector does have 4 spaces: https://cofault.com/treadmill.html. We can move to something closer to that. It is still fine to use 4 sets, instead of 1 doubly linked list.
After reading that paper, I found that the Treadmill in JikesRVM MMTk (and therefore the Rust MMTk) is completely unrelated to the Treadmill algorithm described in that paper.
The four colors in Baker's Treadmill algorithm contains four pointers separating four cell colors, namely ecru, grey, black and white. White cells are unallocated; ecru objects are allocated but not enqueued; grey objects are enqueued but not marked; and black objects are marked. This organization makes it easy to free garbage objects in bulk by moving a few pointers so that ecru (un-enqueued objects) objects become white (free) and black (marked) objects become ecru (un-enqueued). The Treadmill algorithm is not generational.
But the Treadmill data structure in JikesRVM MMTk keeps four lists of objects: allocNursery, collectNursery, toSpace and fromSpace. They match the [nursery, mature] X [from, to] semantics I mentioned above, and they are used as such.
I dug a bit into the history of JikesRVM.
- The
Treadmillwas initially justDoublyLinkedList, where each node contains three pointers: the forward link, the backward link, and a pointer to the "treadmill" which is theDoublyLinkedListinstance itself. - The
DoublyLinkedListtype is immediately renamed toTreadmillin the next commit, and it then contained two separate doubly linked lists, one for the from-space, and the other for the to-space. The third word of a node now points to theTreadmillinstance instead of theDoublyLinkedListinstance it is in. - Several years later, a third linked list "nursery" was added.
- Half a year later, the third word (the pointer to the "treadmill") is removed.
- And several months later, the nursery is split into the "collection nursery" and the "allocation nursery" for the purpose of "in preparation for concurrent/incremental collection".
From this history, we see that the Treadmill in JikesRVM MMTk was never related to Baker's Treadmill algorithm. It is named as "treadmill" because it maintains cyclic linked lists which resemble the tread of a treadmill in the real world.
And the separation of "allocation nursery" and "collection nursery" had a purpose, i.e. "concurrent/incremental collection". We haven't implemented concurrent or incremental plans before, so we see a single nursery is sufficient. But since we have just started to implement ConcurrentImmix, we may eventually find cases where we may need two nurseries.
Given this, I'll revert my refactoring on the naming and structure of the Treadmill type (except for making the mutex coarser) and keep the names we inherited from the JikesRVM MMTk. We may do a refactoring on Treadmill in the future, maybe renaming it so that we don't confuse it with Baker's Treadmill algorithm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I remember this discussion on how the treadmill we've implemented is very different from the actual one: #517
We now have only nursery + from-space + to-space.
Allow controlling whether `TreadMill::enumerate_objects` visits the nursery and the from-space separately. Mutator visits nursery + to_space; Compressor visits only to_space; ConcurrentImmix visits all spaces. It off-loads the assertion to its caller.
We revert back to "alloc_nursery" and "collect_nursery", which were the names used back in the JikesRVM MMTk.
|
I reverted the naming changes, keeping the names "alloc_nursery" and "collect_nursery" as before. |
src/policy/largeobjectspace.rs
Outdated
| } | ||
|
|
||
| /// Enumerate objects for forwarding references. Used by certain mark-compact plans. | ||
| pub(crate) fn enumerate_objects_for_forwarding(&self, enumerator: &mut dyn ObjectEnumerator) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest avoid using the term 'forwarding' here. LOS does not know what forwarding means -- it depends on the plan, like when the function will be called, and what object it should list, etc.
It should be the plan that instructs LOS to iterate objects in to-space and to assert there is no nursery object (it is a hack that currently the compressor space does that instead of the plan).
We no longer mention "forwarding" in the function name, but we add a comment saying that it is a workaround for Compressor.
|
The JikesRVM test is stuck for 5 hours during |
qinsoon
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
We fix a bug when enumerating objects in LOS. During Prepare of ConcurrentImmix, we flip the from and to space of the TreadMill, and then enumerate objects in the LOS to set their unmark bits. However, the object enumeration code only enumerates objects in the to-space. Because we have flipped all objects to the from-space, object enumeration visits no objects at all. Consequently, no objects in LOS have unlog bits during concurrent marking, and the SATB barriers will not be applied to any objects in the LOS.
Now
TreadMill::enumerate_objectstake an additional argumentall: bool. When it is set to true, we enumerate all objects in theTreadmill(which are all objects in the LOS). We do this when setting and clearing unlog bits, and fix the bug.We also refactored the
TreadMillso that all of itsHashSetfields are protected by one singleMutex, because it is not safe to enumerate oneHashSetwhile another thread modifies another.Fixes: mmtk/mmtk-openjdk#334