Feature: Fiber-Local-Storage #15889

BlobCodes · 2025-06-09T17:21:42Z

Related to #15088

This MR adds a Fiber-Local Storage functionality, based on #15616 (comment).

I'm sure this MR still needs some work, especially regarding non-TLS platforms like OpenBSD and Android.

Also, there's probably some discussion needed about the public API.

Fibers created in crystal will use a stack-allocated FLS, but external threads use a fallback which uses the GC.

To improve the FLS performance, I added an unsafe option to the ThreadLocal annotation.
The reason thread-local variables are normally slow is that fibers can spontaneously switch the thread they are on in lots of situations, so LLVM can never be allowed to make any assumptions about any TLS variables.

Since the FLS-section is restored when any thread resumes a fiber, this should not be a problem here.

The unsafe option removes this safety while retaining backwards compatibility.

Example Code

class A
  fiber_local my_fls_var : Int32 = 1

  def self.incr
    puts self.my_fls_var
    self.my_fls_var += 1
  end
end

spawn do
  sleep 3
  A.incr #=> 1
  A.incr #=> 2
  A.incr #=> 3
end

spawn do
  sleep 2
  A.incr #=> 1
  A.incr #=> 2
end

A.incr #=> 1
A.incr #=> 2
sleep 5

Qard

Very interesting!

Hi there, I'm the maintainer of AsyncLocalStorage, the Node.js equivalent of this. 👋🏻

I've put a bunch of thought into context flow in the semantics of JS, but things are a bit different with Fibers. In JS we have a lot of branching execution via promises and callbacks, with merging execution via Promise.all(...). Crystal similarly has clear branching points with spawn calls, but merges are not so much a semantically relevant thing other than maybe fiber.resume which could technically be used in a user-facing way to force a specific fiber context switch. Other than that, there's not so much in the way of clear merge points, though that could change with the possibility of TaskGroup semantics.

I see two additions which could be useful to consider.

The first is a way for the initializer to be able specify to inherit what the context was where the initial spawn call for that Fiber occurred. This would be very useful for passing spans for application tracing through the system. Observability vendors use this capability extensively with AsyncLocalStorage in Node.js to get a parent span to link to when constructing a new span.

The other interesting flow I see, though more nebulous presently, is a way to capture the merging flow. If we get something like TaskGroup/WaitGroup semantics where a group of fibers run to completion before continuing then it could be valuable to have a way to join the related contexts and communicate them forward in some way whenever one of these merge points occurs.

Technically ExecutionContext already basically provides TaskGroup semantics, though it's intended more as somewhat of a singleton, living until the end of the process. It might not be super valuable there to perform some sort of context join, but if more discrete TaskGroup constructs were available it may be useful to be able to model such joining as the semantics of that join would extend out to user-visible behaviour, which is essentially what context flow is intended to model.

In any case, this is very cool to see! I'm excited to see this capability land. 🙂

straight-shoota · 2025-06-10T18:06:21Z

Thanks for pushing this on.

I'm not sure if this solution is the best path forward, due to my concerns in #15616 (comment). As far as I understand, this mechanism only works for class variables.
Perhaps that's fine if there are no good cases for using fiber local storage on other kinds of variables, though.

The unsafe TLS implementation looks interesting and could be worth a consideration on its own, independently of the FLS.
Do you think we could be able to use unsafe mode for Thread.@@current_thread as well?

straight-shoota · 2025-06-10T18:10:00Z

@Qard Your insights are very welcome. Especially the practical experiences.

WaitGroup is already a thing. More enhanced methods of structuring concurrency are discussed in #6468.

I'm wondering what would happen at a merge point. Does that mean he values set in different strings of execution would be merged into the parent context? How to handle collisions, though?

Qard · 2025-06-10T22:33:15Z

Oh, I missed that. I just recalled discussions of it but never saw it had landed. 🙂

As for merge points, I don't think it should replace the context value where a merge point is reached, but having some separate way to acquire a list of context values at the point where a merge occurs would be enough.

Merges are less important in my opinion though. For the purpose of enabling Observability products, I should say that enabling the context flow through branching points is a feature being used now for supporting application tracing. Merge points however are not presently handled as no context manager yet has a solution for that flow.

Presently tracing tools don't really visualize merge points correctly. They generally pick a winner from the list of contexts or don't link back to the merge points at all but rather flatten everything up to the root span of the trace, which loses a lot of causality data. This is a problem which would be good to solve, but as yet no language has solved.

Qard · 2025-06-11T04:31:12Z

I'm not sure if this solution is the best path forward, due to my concerns in #15616 (comment). As far as I understand, this mechanism only works for class variables.

Perhaps that's fine if there are no good cases for using fiber local storage on other kinds of variables, though.

I think that's probably fine. You can always make a container class purely for context in other scenarios. That's essentially what Node.js does--you make AsyncLocalStorage instances which function as map keys to retrieve the corresponding context data from the singular internal AsyncContextFrame map.

ysbaddaden · 2025-06-12T09:26:31Z

The unsafe TLS implementation looks interesting (...) Do you think we could be able to use unsafe mode for Thread.@@current_thread as well?

No. It only works for fiber locals. Even after a fiber or thread context switch, a pointer to a fiber local from before the switch will still be pointing to the same data after the fiber is resumed, even if the fiber is resumed by another thread.

That can't work for other thread locals: when a fiber is resumed by another thread, LLVM would reuse the TLS pointers from before the switch, which could be the wrong Thread instance, for example.

ysbaddaden · 2025-06-12T09:55:33Z

src/crystal/fls.cr

+  # If crystal code is called from an external thread
+  # created from an external library, we may need to
+  # create a FLS dynamically using the GC.
+  include Crystal::PointerLinkedList::Node


I understand that you allocate the FLS for main fibers into the HEAP on demand, but I don't see why you need the linked list. You already assign the pointer to Fiber.current.fls, so the GC will see the live reference and thus won't collect the FLS, but will collect it after the thread has terminated (and we lost the reference to the Fiber) 🤔

I'm not familiar with the current design of Fiber/Thread, but I assumed that they may have allocation optimizations, like placing the Fiber class on the Fiber's stackbottom.

Could maybe be simplified.

ysbaddaden · 2025-06-12T10:05:09Z

src/crystal/fls.cr

+    container = GC.malloc(sizeof(self))
+
+    # There's no pointer-based unsafe_construct for struct
+    container.as(self*).value = self.new


It would indeed be nice to have #pre_initialize for structs, too, not just references. But, we can call ptr.as(self).value.initialize, and we can redefine the #initialize method for each fiber local with a default value, then call previous_def to inject the previous initializers. For example:

struct Foo def self.create foo = GC.malloc(sizeof(self)).as(self*) foo.value.initialize foo end end struct Foo # must be 'uninitialized' otherwise the compiler complains (not initialized in all initializers) @one = uninitialized Int32 def initialize @one = 123 end end struct Foo @two = uninitialized Int32 def initialize previous_def @two = 456 end end foo = Foo.create p foo.value # => Foo(@one=123, @two=456)

If you rename the first def to pre_initialize and call it explicitly from create, you get the same behaviour. But without the need for previous_def.

Or we can just use a class, ReferenceStorage and .unsafe_construct and it would also work:

class FLS # ... end fls = uninitialized ReferenceStorage(FLS) @fls = FLS.unsafe_construct(pointerof(fls))

Bonus: a mere FLS.new will allocate an instance in the GC HEAP.

Bonus: we can merely use FLS references instead of a pointer, and can use a property %var with direct accessors in the generated class getter/setter methods.

Yes, this sounds like a nice simplification. Since ReferenceStorage there should be little reason for having a struct type where you always pass pointers to it around.

But, we can call ptr.as(self).value.initialize

The struct currently has an empty initialize method and AFAIK the default ivar values are assigned in the .allocate method.

Or we can just use a class

class has a constant overhead of 4 (?) bytes because of its type id field.

I wanted to optimize the FLS away in case it's not used, so I used struct.

I wish we had "naked" classes or something like that.

Also, ReferenceStorage needs a modern crystal version, and we want to be able to build a new crystal compiler using an old version of the compiler.

class has a constant overhead of 4 (?) bytes because of its type id field.

I wanted to optimize the FLS away in case it's not used, so I used struct.

True, that's a noticeable difference between the two. But I'm not sure if those extra 4 bytes are that relevant.

Type id's are only used in some specific situations. If we can be sure that never happens for this type, it should be technically possible - though unsafe, of course - to reduce the size. We could allocate 4 bytes less and decrement the pointer address by 4 before casting to the class type 🤷

ReferenceStorage is a new feature, but uninitialized UInt8[instance_sizeof(T)] should work as a polyfill. Missing pre_initialize is a bigger problem and would require something like the initialize chaining mentioned in the first comment of this thread.

The difference between struct or class is probably not that important. It's fine either way, as long as it's somewhat reasonable. We can reconsider the details later, once we have merged an MVP.

Not exactly: uninitialized UInt8[instance_sizeof(T)] is missing alignment, so it must reserve alignof(T) more bytes and manually align the pointer, hence the reason why we have ReferenceStorage.

IMO it would be just fine to always allocate in the HEAP. In addition to all fibers behaving the same, it's one single malloc per fiber, and reserving/initializing the space can be delayed to on demand —when we need a FLS value, if the fiber ever needs to access 'em.

straight-shoota · 2025-06-12T12:42:01Z

After familiarizing with the concept a bit more, I understand. Yeah, makes sense that we cannot use unsafe for Thread.current. 🤦

But I suppose we could use it to optimize Fiber.current. It is currently implemented as Thread.current.current_fiber, so it has the safe TLS overhead. But it should be able to use unsafe TLS just like the FLS struct in this patch.

ysbaddaden · 2025-06-12T13:13:45Z

In general we'd benefit to directly have thread locals instead of going through instance variables on Thread.current.

For Fiber.current, I guess you're right. It's actually a fiber local, and as long as we set it before swapcontext (like we set the fls pointer in this PR) then it should be fine 🤔

src/crystal/fls.cr

BlobCodes · 2025-06-13T16:44:25Z

After familiarizing with the concept a bit more, I understand. Yeah, makes sense that we cannot use unsafe for Thread.current. 🤦

We could use it for Thread.current by using the same trick currently done by codegen - using @[NoInline].

Since Thread.current accesses TLS multiple times (since it lazy-initializes the Thread class and may need to write to TLS), we could use that to replace a method using multiple noinline calls by just one noinline call that re-uses the tls pointer.

straight-shoota · 2025-08-09T17:28:19Z

I've done some exploration of existing usage patterns for fiber local storage. Discussion in https://forum.crystal-lang.org/t/field-study-of-fiber-local-storage/8325/1

BlobCodes added 4 commits June 9, 2025 14:24

feature: add unsafe flag to TLS to avoid noinline method

6da1389

chore: Add specs for unsafe TLS

5531f1d

feat: Fiber-Local-Storage

8f8ae96

perf: avoid allocation in main fiber

d286988

Qard reviewed Jun 10, 2025

View reviewed changes

ysbaddaden reviewed Jun 12, 2025

View reviewed changes

straight-shoota reviewed Jun 13, 2025

View reviewed changes

src/crystal/fls.cr Outdated Show resolved Hide resolved

fix: remove verbatim block in fls

d90f683

typo: change undefied to undefined

f0bd11b

ysbaddaden mentioned this pull request Jun 27, 2025

Add Thread::Local(T) #15616

Merged

ysbaddaden mentioned this pull request Jul 28, 2025

PoC: Replace Thread::Local(T) with Thread::LocalStorage #16029

Draft

Uh oh!

Feature: Fiber-Local-Storage #15889

Are you sure you want to change the base?

Feature: Fiber-Local-Storage #15889

Uh oh!

Conversation

BlobCodes commented Jun 9, 2025

Uh oh!

Qard left a comment

Choose a reason for hiding this comment

Uh oh!

straight-shoota commented Jun 10, 2025

Uh oh!

straight-shoota commented Jun 10, 2025

Uh oh!

Qard commented Jun 10, 2025

Uh oh!

Qard commented Jun 11, 2025

Uh oh!

ysbaddaden commented Jun 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ysbaddaden Jun 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

straight-shoota commented Jun 12, 2025

Uh oh!

ysbaddaden commented Jun 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

BlobCodes commented Jun 13, 2025

Uh oh!

straight-shoota commented Aug 9, 2025

Uh oh!

Uh oh!

ysbaddaden commented Jun 12, 2025 •

edited

Loading

ysbaddaden Jun 13, 2025 •

edited

Loading

ysbaddaden commented Jun 12, 2025 •

edited

Loading