Skip to content

Conversation

mohamedawnallah
Copy link
Contributor

Change Description

Closes #276

@guggero
Copy link
Contributor

guggero commented Mar 25, 2024

While this implementation is likely correct and probably the thing we should do, I worry about the performance hit this will take. Do we have a benchmark test to see how much of an impact this change has? Perhaps we need to also refactor things a bit to make sure we only call the appendRaw method for big chunks of data where it's worth syncing and not for every single entry for example.

@mohamedawnallah
Copy link
Contributor Author

Do we have a benchmark test to see how much of an impact this change has?

Yes, you're right! It may cause overhead of syncing unless a given benchmark proves otherwise 👍

Perhaps we need to also refactor things a bit to make sure we only call the appendRaw method for big chunks of data where it's worth syncing and not for every single entry for example.

I will look into this. Thanks for sharing!

@mohamedawnallah
Copy link
Contributor Author

mohamedawnallah commented Jun 14, 2025

I worry about the performance hit this will take.

you are right @guggero. I have got an opportunity to go through this PR/Issue and the file sync operation is a significant one increases with the batch size. I have tried batch size from 1 -> 2048 (2^0 -> 2^11). Here are the benchmarking results for reference.

Benchmarking Results (formatted using Gemini)

Benchmark Analysis: HeaderStoreAppendRawBatches Performance

This table summarizes the go test -bench=BenchmarkHeaderStoreAppendRawBatches -benchmem results, focusing on the ns/op (nanoseconds per operation) and memory allocation (B/op, allocs/op) for different batch sizes, comparing NoSync and WithSync modes.

Batch Size Ns/Op (NoSync) Ns/Op (WithSync) Sync Overhead Ratio (WithSync / NoSync) Memory (B/Op) Allocs (Allocs/Op) Observations
1 3,807 543,903 142.87x 0 0 Sync introduces significant overhead.
2 6,155 947,775 153.98x 0 0 Cost scales roughly linearly with batch size for both.
4 11,420 1,780,687 155.93x 0 0  
8 23,931 3,233,856 135.13x 0 0  
16 43,358 6,708,534 154.72x 0 0  
32 94,938 12,698,081 133.75x 0 (NoSync), 1 (WithSync) 0 Minor memory allocation appears for WithSync at larger batch sizes.
64 171,982 28,715,089 166.97x 0 (NoSync), 3 (WithSync) 0  
128 347,092 62,950,333 181.36x 0 (NoSync), 6 (WithSync) 0  
256 731,023 146,360,099 200.21x 0 (NoSync), 14 (WithSync) 0  
512 1,485,216 282,939,047 190.50x 0 (NoSync), 32 (WithSync) 0  
1024 2,830,325 718,032,200 253.69x 0 (NoSync), 64 (WithSync) 0 Sync overhead ratio continues to climb.
2048 5,430,942 1,812,322,954 333.70x 0 (NoSync), 128 (WithSync) 0 (NoSync), 1 (WithSync) Highest sync overhead observed; single allocation for WithSync.

Key Takeaways:

  1. Synchronization is the Dominant Factor: The WithSync operations are orders of magnitude slower (130x to 330x) than NoSync operations, clearly indicating that disk synchronization (flushing data from cache to persistent storage) is the overwhelming bottleneck.

  2. Scaling with Batch Size: Both NoSync and WithSync operations show a roughly linear increase in Ns/Op with larger batch sizes, which is expected as more data is processed.

  3. Increasing Sync Overhead with Larger Batches: The sync overhead ratio generally increases as the batch size grows, suggesting that the cost of synchronizing a larger amount of buffered data also increases.

  4. Memory Efficiency: The BenchmarkHeaderStoreAppendRawBatches function is highly memory-efficient, performing minimal to no memory allocations per operation for NoSync and only very small allocations for WithSync at larger batch sizes. This means memory pressure (e.g., from garbage collection) is not a significant concern for this operation.

@mohamedawnallah
Copy link
Contributor Author

I think probably ignoring the file sync operation is a tradeoff someone could take and I think the reference issue #226 becomes less significant especially after this patch #313. What do you think?

@guggero
Copy link
Contributor

guggero commented Jun 16, 2025

Nice benchmark tests!

An alternative to syncing on every write would be a time based sync. So every X seconds we'd call Sync() on the header store's file.
Or a counter based sync, so every X calls to appendRaw() we call sync.

I agree that with #313 things probably already look better. But perhaps syncing from time to time might still help a bit.

@Roasbeef
Copy link
Member

Doesn't the truncate logic take care of this?

neutrino/headerfs/store.go

Lines 241 to 248 in 27977e1

// Otherwise, we'll need to truncate the file until it matches the
// current index tip.
for fileHeight > tipHeight {
if err := bhs.singleTruncate(); err != nil {
return nil, err
}
fileHeight--

@Roasbeef
Copy link
Member

On second thought, there's a TODO above there that mentions we don't expect the index to be ahead of the file. It's possible to have a distinct loop that'll replace the current index tip with the latest header in the file.

@saubyk saubyk requested review from Roasbeef and guggero June 19, 2025 02:00
@guggero
Copy link
Contributor

guggero commented Jun 23, 2025

Just FYI, I'm holding off on review until some form of #296 (comment) is implemented.

@guggero guggero removed their request for review June 30, 2025 08:53
@lightninglabs-deploy
Copy link

@Roasbeef: review reminder

@mohamedawnallah
Copy link
Contributor Author

This issue seems not urgent. Gonna put it in draft mode for now the bot can stop notifying involved reviewers

@mohamedawnallah mohamedawnallah marked this pull request as draft September 13, 2025 19:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fsync instead of calling Write

4 participants