headerfs: sync file after writing raw header in `appendRaw` function #296

mohamedawnallah · 2024-03-24T02:47:03Z

Change Description

Closes #276

guggero · 2024-03-25T08:02:40Z

While this implementation is likely correct and probably the thing we should do, I worry about the performance hit this will take. Do we have a benchmark test to see how much of an impact this change has? Perhaps we need to also refactor things a bit to make sure we only call the appendRaw method for big chunks of data where it's worth syncing and not for every single entry for example.

mohamedawnallah · 2024-04-01T21:09:10Z

Do we have a benchmark test to see how much of an impact this change has?

Yes, you're right! It may cause overhead of syncing unless a given benchmark proves otherwise 👍

Perhaps we need to also refactor things a bit to make sure we only call the appendRaw method for big chunks of data where it's worth syncing and not for every single entry for example.

I will look into this. Thanks for sharing!

mohamedawnallah · 2025-06-14T05:58:50Z

I worry about the performance hit this will take.

you are right @guggero. I have got an opportunity to go through this PR/Issue and the file sync operation is a significant one increases with the batch size. I have tried batch size from 1 -> 2048 (2^0 -> 2^11). Here are the benchmarking results for reference.

Benchmarking Results (formatted using Gemini)

Benchmark Analysis: HeaderStoreAppendRawBatches Performance

This table summarizes the go test -bench=BenchmarkHeaderStoreAppendRawBatches -benchmem results, focusing on the ns/op (nanoseconds per operation) and memory allocation (B/op, allocs/op) for different batch sizes, comparing NoSync and WithSync modes.

Batch Size Ns/Op (NoSync) Ns/Op (WithSync) Sync Overhead Ratio (WithSync / NoSync) Memory (B/Op) Allocs (Allocs/Op) Observations

1 3,807 543,903 142.87x 0 0 Sync introduces significant overhead.

2 6,155 947,775 153.98x 0 0 Cost scales roughly linearly with batch size for both.

4 11,420 1,780,687 155.93x 0 0

8 23,931 3,233,856 135.13x 0 0

16 43,358 6,708,534 154.72x 0 0

32 94,938 12,698,081 133.75x 0 (NoSync), 1 (WithSync) 0 Minor memory allocation appears for WithSync at larger batch sizes.

64 171,982 28,715,089 166.97x 0 (NoSync), 3 (WithSync) 0

128 347,092 62,950,333 181.36x 0 (NoSync), 6 (WithSync) 0

256 731,023 146,360,099 200.21x 0 (NoSync), 14 (WithSync) 0

512 1,485,216 282,939,047 190.50x 0 (NoSync), 32 (WithSync) 0

1024 2,830,325 718,032,200 253.69x 0 (NoSync), 64 (WithSync) 0 Sync overhead ratio continues to climb.

2048 5,430,942 1,812,322,954 333.70x 0 (NoSync), 128 (WithSync) 0 (NoSync), 1 (WithSync) Highest sync overhead observed; single allocation for WithSync.

Key Takeaways:

Synchronization is the Dominant Factor: The WithSync operations are orders of magnitude slower (130x to 330x) than NoSync operations, clearly indicating that disk synchronization (flushing data from cache to persistent storage) is the overwhelming bottleneck.
Scaling with Batch Size: Both NoSync and WithSync operations show a roughly linear increase in Ns/Op with larger batch sizes, which is expected as more data is processed.
Increasing Sync Overhead with Larger Batches: The sync overhead ratio generally increases as the batch size grows, suggesting that the cost of synchronizing a larger amount of buffered data also increases.
Memory Efficiency: The BenchmarkHeaderStoreAppendRawBatches function is highly memory-efficient, performing minimal to no memory allocations per operation for NoSync and only very small allocations for WithSync at larger batch sizes. This means memory pressure (e.g., from garbage collection) is not a significant concern for this operation.

mohamedawnallah · 2025-06-14T06:02:41Z

I think probably ignoring the file sync operation is a tradeoff someone could take and I think the reference issue #226 becomes less significant especially after this patch #313. What do you think?

guggero · 2025-06-16T14:33:34Z

Nice benchmark tests!

An alternative to syncing on every write would be a time based sync. So every X seconds we'd call Sync() on the header store's file.
Or a counter based sync, so every X calls to appendRaw() we call sync.

I agree that with #313 things probably already look better. But perhaps syncing from time to time might still help a bit.

Roasbeef · 2025-06-19T00:59:32Z

Doesn't the truncate logic take care of this?

neutrino/headerfs/store.go

Lines 241 to 248 in 27977e1

    
           // Otherwise, we'll need to truncate the file until it matches the 
        
           // current index tip. 
        
           for fileHeight > tipHeight { 
        
           	if err := bhs.singleTruncate(); err != nil { 
        
           		return nil, err 
        
           	} 
        
           	fileHeight--

Roasbeef · 2025-06-19T01:00:47Z

On second thought, there's a TODO above there that mentions we don't expect the index to be ahead of the file. It's possible to have a distinct loop that'll replace the current index tip with the latest header in the file.

guggero · 2025-06-23T08:14:34Z

Just FYI, I'm holding off on review until some form of #296 (comment) is implemented.

lightninglabs-deploy · 2025-09-08T13:03:03Z

@Roasbeef: review reminder

mohamedawnallah · 2025-09-13T19:30:59Z

This issue seems not urgent. Gonna put it in draft mode for now the bot can stop notifying involved reviewers

headerfs: sync file after writing raw header in appendRaw function

36b6300

saubyk assigned mohamedawnallah Apr 14, 2025

saubyk requested review from Roasbeef and guggero June 19, 2025 02:00

guggero removed their request for review June 30, 2025 08:53

mohamedawnallah marked this pull request as draft September 13, 2025 19:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

headerfs: sync file after writing raw header in `appendRaw` function #296

headerfs: sync file after writing raw header in `appendRaw` function #296

Uh oh!

mohamedawnallah commented Mar 24, 2024

Uh oh!

guggero commented Mar 25, 2024

Uh oh!

mohamedawnallah commented Apr 1, 2024

Uh oh!

mohamedawnallah commented Jun 14, 2025 •

edited

Loading

Uh oh!

mohamedawnallah commented Jun 14, 2025

Uh oh!

guggero commented Jun 16, 2025

Uh oh!

Roasbeef commented Jun 19, 2025

Uh oh!

Roasbeef commented Jun 19, 2025

Uh oh!

guggero commented Jun 23, 2025

Uh oh!

lightninglabs-deploy commented Sep 8, 2025

Uh oh!

mohamedawnallah commented Sep 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

headerfs: sync file after writing raw header in appendRaw function #296

Are you sure you want to change the base?

headerfs: sync file after writing raw header in appendRaw function #296

Uh oh!

Conversation

mohamedawnallah commented Mar 24, 2024

Change Description

Uh oh!

guggero commented Mar 25, 2024

Uh oh!

mohamedawnallah commented Apr 1, 2024

Uh oh!

mohamedawnallah commented Jun 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmarking Results (formatted using Gemini)

Benchmark Analysis: HeaderStoreAppendRawBatches Performance

Uh oh!

mohamedawnallah commented Jun 14, 2025

Uh oh!

guggero commented Jun 16, 2025

Uh oh!

Roasbeef commented Jun 19, 2025

Uh oh!

Roasbeef commented Jun 19, 2025

Uh oh!

guggero commented Jun 23, 2025

Uh oh!

lightninglabs-deploy commented Sep 8, 2025

Uh oh!

mohamedawnallah commented Sep 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

headerfs: sync file after writing raw header in `appendRaw` function #296

headerfs: sync file after writing raw header in `appendRaw` function #296

mohamedawnallah commented Jun 14, 2025 •

edited

Loading