-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Feature: Fiber-Local-Storage #15889
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Feature: Fiber-Local-Storage #15889
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very interesting!
Hi there, I'm the maintainer of AsyncLocalStorage, the Node.js equivalent of this. 👋🏻
I've put a bunch of thought into context flow in the semantics of JS, but things are a bit different with Fibers. In JS we have a lot of branching execution via promises and callbacks, with merging execution via Promise.all(...). Crystal similarly has clear branching points with spawn calls, but merges are not so much a semantically relevant thing other than maybe fiber.resume which could technically be used in a user-facing way to force a specific fiber context switch. Other than that, there's not so much in the way of clear merge points, though that could change with the possibility of TaskGroup semantics.
I see two additions which could be useful to consider.
The first is a way for the initializer to be able specify to inherit what the context was where the initial spawn call for that Fiber occurred. This would be very useful for passing spans for application tracing through the system. Observability vendors use this capability extensively with AsyncLocalStorage in Node.js to get a parent span to link to when constructing a new span.
The other interesting flow I see, though more nebulous presently, is a way to capture the merging flow. If we get something like TaskGroup/WaitGroup semantics where a group of fibers run to completion before continuing then it could be valuable to have a way to join the related contexts and communicate them forward in some way whenever one of these merge points occurs.
Technically ExecutionContext already basically provides TaskGroup semantics, though it's intended more as somewhat of a singleton, living until the end of the process. It might not be super valuable there to perform some sort of context join, but if more discrete TaskGroup constructs were available it may be useful to be able to model such joining as the semantics of that join would extend out to user-visible behaviour, which is essentially what context flow is intended to model.
In any case, this is very cool to see! I'm excited to see this capability land. 🙂
Thanks for pushing this on. I'm not sure if this solution is the best path forward, due to my concerns in #15616 (comment). As far as I understand, this mechanism only works for class variables. The unsafe TLS implementation looks interesting and could be worth a consideration on its own, independently of the FLS. |
@Qard Your insights are very welcome. Especially the practical experiences.
I'm wondering what would happen at a merge point. Does that mean he values set in different strings of execution would be merged into the parent context? How to handle collisions, though? |
Oh, I missed that. I just recalled discussions of it but never saw it had landed. 🙂 As for merge points, I don't think it should replace the context value where a merge point is reached, but having some separate way to acquire a list of context values at the point where a merge occurs would be enough. Merges are less important in my opinion though. For the purpose of enabling Observability products, I should say that enabling the context flow through branching points is a feature being used now for supporting application tracing. Merge points however are not presently handled as no context manager yet has a solution for that flow. Presently tracing tools don't really visualize merge points correctly. They generally pick a winner from the list of contexts or don't link back to the merge points at all but rather flatten everything up to the root span of the trace, which loses a lot of causality data. This is a problem which would be good to solve, but as yet no language has solved. |
I think that's probably fine. You can always make a container class purely for context in other scenarios. That's essentially what Node.js does--you make AsyncLocalStorage instances which function as map keys to retrieve the corresponding context data from the singular internal AsyncContextFrame map. |
No. It only works for fiber locals. Even after a fiber or thread context switch, a pointer to a fiber local from before the switch will still be pointing to the same data after the fiber is resumed, even if the fiber is resumed by another thread. That can't work for other thread locals: when a fiber is resumed by another thread, LLVM would reuse the TLS pointers from before the switch, which could be the wrong |
# If crystal code is called from an external thread | ||
# created from an external library, we may need to | ||
# create a FLS dynamically using the GC. | ||
include Crystal::PointerLinkedList::Node |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand that you allocate the FLS for main fibers into the HEAP on demand, but I don't see why you need the linked list. You already assign the pointer to Fiber.current.fls
, so the GC will see the live reference and thus won't collect the FLS, but will collect it after the thread has terminated (and we lost the reference to the Fiber) 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not familiar with the current design of Fiber/Thread, but I assumed that they may have allocation optimizations, like placing the Fiber class on the Fiber's stackbottom.
Could maybe be simplified.
container = GC.malloc(sizeof(self)) | ||
|
||
# There's no pointer-based unsafe_construct for struct | ||
container.as(self*).value = self.new |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would indeed be nice to have #pre_initialize
for structs, too, not just references. But, we can call ptr.as(self).value.initialize
, and we can redefine the #initialize
method for each fiber local with a default value, then call previous_def
to inject the previous initializers. For example:
struct Foo
def self.create
foo = GC.malloc(sizeof(self)).as(self*)
foo.value.initialize
foo
end
end
struct Foo
# must be 'uninitialized' otherwise the compiler complains (not initialized in all initializers)
@one = uninitialized Int32
def initialize
@one = 123
end
end
struct Foo
@two = uninitialized Int32
def initialize
previous_def
@two = 456
end
end
foo = Foo.create
p foo.value # => Foo(@one=123, @two=456)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you rename the first def to pre_initialize
and call it explicitly from create
, you get the same behaviour. But without the need for previous_def
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or we can just use a class, ReferenceStorage
and .unsafe_construct
and it would also work:
class FLS
# ...
end
fls = uninitialized ReferenceStorage(FLS)
@fls = FLS.unsafe_construct(pointerof(fls))
Bonus: a mere FLS.new
will allocate an instance in the GC HEAP.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bonus: we can merely use FLS
references instead of a pointer, and can use a property %var
with direct accessors in the generated class getter/setter methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this sounds like a nice simplification. Since ReferenceStorage
there should be little reason for having a struct type where you always pass pointers to it around.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But, we can call
ptr.as(self).value.initialize
The struct currently has an empty initialize method and AFAIK the default ivar values are assigned in the .allocate
method.
Or we can just use a class
class
has a constant overhead of 4 (?) bytes because of its type id field.
I wanted to optimize the FLS away in case it's not used, so I used struct.
I wish we had "naked" classes or something like that.
Also, ReferenceStorage
needs a modern crystal version, and we want to be able to build a new crystal compiler using an old version of the compiler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
class
has a constant overhead of 4 (?) bytes because of its type id field.I wanted to optimize the FLS away in case it's not used, so I used struct.
True, that's a noticeable difference between the two. But I'm not sure if those extra 4 bytes are that relevant.
Type id's are only used in some specific situations. If we can be sure that never happens for this type, it should be technically possible - though unsafe, of course - to reduce the size. We could allocate 4 bytes less and decrement the pointer address by 4 before casting to the class type 🤷
ReferenceStorage
is a new feature, but uninitialized UInt8[instance_sizeof(T)]
should work as a polyfill. Missing pre_initialize
is a bigger problem and would require something like the initialize
chaining mentioned in the first comment of this thread.
The difference between struct or class is probably not that important. It's fine either way, as long as it's somewhat reasonable. We can reconsider the details later, once we have merged an MVP.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not exactly: uninitialized UInt8[instance_sizeof(T)]
is missing alignment, so it must reserve alignof(T)
more bytes and manually align the pointer, hence the reason why we have ReferenceStorage
.
IMO it would be just fine to always allocate in the HEAP. In addition to all fibers behaving the same, it's one single malloc
per fiber, and reserving/initializing the space can be delayed to on demand —when we need a FLS value, if the fiber ever needs to access 'em.
After familiarizing with the concept a bit more, I understand. Yeah, makes sense that we cannot use unsafe for But I suppose we could use it to optimize |
In general we'd benefit to directly have thread locals instead of going through instance variables on For |
We could use it for Since |
I've done some exploration of existing usage patterns for fiber local storage. Discussion in https://forum.crystal-lang.org/t/field-study-of-fiber-local-storage/8325/1 |
Related to #15088
This MR adds a Fiber-Local Storage functionality, based on #15616 (comment).
I'm sure this MR still needs some work, especially regarding non-TLS platforms like OpenBSD and Android.
Also, there's probably some discussion needed about the public API.
Fibers created in crystal will use a stack-allocated FLS, but external threads use a fallback which uses the GC.
To improve the FLS performance, I added an unsafe option to the ThreadLocal annotation.
The reason thread-local variables are normally slow is that fibers can spontaneously switch the thread they are on in lots of situations, so LLVM can never be allowed to make any assumptions about any TLS variables.
Since the FLS-section is restored when any thread resumes a fiber, this should not be a problem here.
The unsafe option removes this safety while retaining backwards compatibility.
Example Code