Skip to content

Conversation

Roasbeef
Copy link
Member

In this PR, we add an async processing queue for the gossip timestamp filter. This ensures that apply a filter never blocks another goroutine. If the queue is full, then the message is just dropped.

We also make the filtering concurrency configurable.

Finally, we add a new set of docs to explain how the various rate limiting config options work in tandem.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @Roasbeef, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the robustness and configurability of the gossip subsystem by introducing an asynchronous processing queue for timestamp range filters. This change prevents potential blocking issues and improves overall responsiveness under heavy load. Additionally, it provides users with fine-grained control over filter concurrency and includes extensive documentation to guide optimal configuration of the gossip rate limiting parameters.

Highlights

  • Asynchronous Gossip Filter Processing: Implemented an asynchronous processing queue for GossipTimestampRange messages. This change ensures that applying a gossip filter never blocks other goroutines, improving responsiveness. If the queue is full, messages are dropped to prevent backpressure.
  • Configurable Gossip Filter Concurrency: Introduced a new configuration option, gossip.filter-concurrency, allowing users to specify the maximum number of concurrent gossip filter applications. This provides greater control over resource utilization and performance.
  • Comprehensive Gossip Rate Limiting Documentation: Added a new documentation file (docs/gossip_rate_limiting.md) that thoroughly explains the various gossip rate limiting configuration options (msg-rate-bytes, msg-burst-bytes, filter-concurrency, num-restricted-slots), including guidance on calculating appropriate values, troubleshooting common issues, and best practices.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments or fill out our survey to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces an asynchronous processing queue for gossip timestamp filter messages. This is a great improvement to prevent blocking the main gossiper goroutine, especially when rate limits are active. The changes are well-structured, and the addition of comprehensive tests for the new queueing logic in discovery/syncer_queue_test.go is excellent. The new documentation in docs/gossip_rate_limiting.md is also very clear and helpful for users. The style guide violations found were related to select statements missing comments for each case, and line lengths exceeding 80 characters in the documentation. Additionally, there was a discrepancy in the default value for FilterConcurrency.

@saubyk saubyk added this to lnd v0.20 Jul 22, 2025
@saubyk saubyk modified the milestones: v0.19.2, v0.20.0 Jul 22, 2025
@saubyk saubyk moved this to In progress in lnd v0.20 Jul 22, 2025
Copy link
Member

@yyforyongyu yyforyongyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we do a 0.19.3 to include this fix? In addition I have a design question - we introduced rate limiter so we can slow down without dropping events, yet we are dropping queries here, do we need to stick to one pattern instead? Maybe a mixed approach works here, plus the dropping part really makes things easier. Also should we also drop queries like QueryShortChanIDs, or we only care about timestamp queries since it's the only trouble maker?

@ellemouton ellemouton self-requested a review July 22, 2025 10:58
@Roasbeef
Copy link
Member Author

Should we do a 0.19.3 to include this fix? In addition I have a design question - we introduced rate limiter so we can slow down without dropping events, yet we are dropping queries here, do we need to stick to one pattern instead?

The goal wasn't necessarily to slow down without dropping events. Just to rate limit how much we'll send out to avoid runaway bandwidth utilization. What we're dropping here is the GossipQueryTimestamp message. Peers should only really send this once upon connection. There's no good reason for a peer to send this 10 times over the course of a connection. We even have a flag to just ignore these queries all together as it's an easy way to get a peer to do a lot of work for you repeatedly.

In this PR, we use a buffer of 10 which is maybe even too large, I think we can safely halve that. The reason it's important that we have a purely async send here (never blocks, drops if can't send) is that if we block here, it'll also block the readHandler.

Re dropping, I have this PR, which implements Random Early Dropping for the message queue. This ensures that the queues never get full (congested). We only drop non-essential messages like gossip messages. We'll never drop messages like channel updates, etc. Ultimately, gossip is secondary to link level channel operations. Hence my suggestion to move to something like QUIC (and/or distinct TCP connections) so concerns can be entirely separated.

@saubyk saubyk changed the title multi: add async processing queue for gossip timestamp fiiler multi: add async processing queue for gossip timestamp fitler Jul 22, 2025
@saubyk saubyk changed the title multi: add async processing queue for gossip timestamp fitler multi: add async processing queue for gossip timestamp filter Jul 22, 2025
@morehouse
Copy link
Collaborator

Ideally, any time we get a new filter, shouldn't we stop processing the old one, and switch to the new one? It seems silly/antisocial to keep sending gossip the peer doesn't want anymore.

So no need for a queue at all?

@Roasbeef
Copy link
Member Author

Ideally, any time we get a new filter, shouldn't we stop processing the old one, and switch to the new one? It seems silly/antisocial to keep sending gossip the peer doesn't want anymore.

From the same peer? In the latest push, I've just reduced the size of the buffered channel to one.

Re stop processing the old one: are you suggesting that we reduce the filter sema size to 1? As is, for a given peer, we'll only ever process one filter at a time. As an example, lnd will set a nil filter to request to gossip, but then at a later time, as we rotate through gossip peers, we'll set a non-zero, but near term filter so we get a trickle.

@morehouse
Copy link
Collaborator

From the same peer? In the latest push, I've just reduced the size of the buffered channel to one.

Right, the same peer.

Re stop processing the old one: are you suggesting that we reduce the filter sema size to 1? As is, for a given peer, we'll only ever process one filter at a time. As an example, lnd will set a nil filter to request to gossip, but then at a later time, as we rotate through gossip peers, we'll set a non-zero, but near term filter so we get a trickle.

I'm suggesting that we early-stop the goroutine that's sending gossip associated with the previous filter, and then only send gossip that matches the new filter. No need to buffer gossip_timestamp_filter messages.

I think we would still need the global semaphore to limit our memory consumption, unless we're going to start limiting how many messages we load from the DB at a time.

Copy link
Collaborator

@ellemouton ellemouton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good! only thing is that the default values seem incorrect

Copy link
Member

@yyforyongyu yyforyongyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm suggesting that we early-stop the goroutine that's sending gossip associated with the previous filter, and then only send gossip that matches the new filter. No need to buffer gossip_timestamp_filter messages.

Yeah it would be nice to stop the previous one and only process the new one, tho I think it may create another issue that the misbehaving node keeps sending new ones while we are processing the old one. Overall I view gossip reply as a free service, and we are delivering use best-effort. Imo the other shouldn't query that many and often.

@morehouse
Copy link
Collaborator

Here's some more details about why and how we could implement this.

Context

gossip_timestamp_filter, as the name implies, is primarily a mechanism to filter the kind of gossip that your peers send you. This is mutually beneficial for the nodes involved -- you only have to process gossip that you want to know about, while the peers don't waste resources sending you gossip you will ignore anyway.

This means that it only ever makes sense to have a single filter active at a time. When a peer sends a new filter, it should immediately replace the previous filter, regardless of whether all gossip matching the previous filter has been sent. Essentially any gossip sent according to the previous filter can be considered a complete waste of resources. If we don't stop servicing the previous filter immediately, and if there's any overlap in filters, we will end up sending the same gossip multiple times, thereby aggravating the outgoing bandwidth issues we're currently seeing.

gossip_timestamp_filter is an inherent resource DoS risk, since it is asymmetrical. A small (< 100 B) gossip_timestamp_filter message can cause us to send many MBs of gossip. So we should do what we can to avoid wasted resources when applying these filters.

This PR

The current design does 2 main things:

  • reduces the default global filterSema from 5 to 1.
  • drops new filters if we haven't sent all the gossip associated with the first filter yet

Reducing filterSema

I'm not sure whether it makes sense to reduce filterSema to 1, as this essentially allows one peer to delay all other peers from syncing gossip. Suppose one peer requests 10 MB of gossip (say ~100k individual messages). At the current default rate limit of 50 KB/s, it will take a bare minimum of 3 minutes to service that request. Additionally, a slow peer will significantly stall the sync. For example, if it takes the peer 1 second to process each message, it will take 100k seconds (~27 hours) to finish the gossip sync, meanwhile no other nodes can sync gossip.

Of course a filterSema of 5 doesn't prevent an attacker from doing this on purpose either. But it does keep a slow but honest peer from impacting other peers in this way.

Dropping new filters

As explained above, this doesn't make sense and wastes our own resources. We should drop the old filter instead.

Proposed changes

I think we can simplify the design slightly, while fixing the filter handling.

Smaller, separate PRs

The filterSema change seems orthogonal and belongs in a separate PR, with a clear justification for the change. Likewise, the new doc can go in a separate PR.

Separating these out will make it easier to focus review on the main goal of this PR -- improving handling of gossip filters.

Assimilating QueueTimestampRange behavior into ApplyGossipFilter

Let's make ApplyGossipFilter do what the QueueTimestampRange function currently does -- apply the gossip filter without blocking (as the name would imply). Let's also remove the (currently unused) config option to set the queue size. Queue sizes other than 1 don't make sense here anyway.

  • ApplyGossipFilter sends the filter message to a dedicated "historical sync" goroutine.

The "historical sync" goroutine does the following in a loop:

  1. Reads the new filter (blocking)
  2. Acquires the filterSema
  3. Loads the requested gossip messages
  4. Sends the requested gossip messages one-by-one in a loop
    • On each loop iteration, checks if a new gossip filter has been received (non-blocking).
      • If so, releases filterSema and goes back to step 2.
      • If not, sends the next gossip message

Use a simple "overwritable" queue

Instead of using an actual chan to send gossip filters to the historical sync goroutine, we want a data structure that buffers one filter and overwrites that filter when a new one is sent. Something like this:

type LatestFilter struct {
	mu sync.Mutex
	cond *sync.Cond
	filter *GossipTimestampRange
	newFilter bool
}

func (l *LatestFilter) Set(f *GossipTimestampRange) {
	l.mu.Lock()
	defer l.mu.Unlock()

	l.filter = f
	l.newFilter = true
	l.cond.Signal()
}

func (l *LatestFilter) TryGet() *GossipTimestampRange {
	l.mu.Lock()
	defer l.mu.Unlock()

	if !l.newFilter {
		return nil
	}
	l.newFilter = false
	return l.filter
}

func (l *LatestFilter) Get() *GossipTimestampRange {
	l.mu.Lock()
	defer l.mu.Unlock()

	for !l.newFilter {
		l.cond.Wait()
	}
	l.newFilter = false
	return l.filter
}

@Roasbeef
Copy link
Member Author

I'm suggesting that we early-stop the goroutine that's sending gossip associated with the previous filter, and then only send gossip that matches the new filter. No need to buffer gossip_timestamp_filter messages.

With what's in this PR now, there's no effective buffer. We just make sure that the sends are always non-blocking. If they send another filter before we process the prior one, it gets dropped. If they send a filter while we're processing one, it gets buffered for later.

There're two cases, when they set a backlog vs when they don't. If they aren't specifying a timestamp in the past for a backlog, then there's no extra work to be done. Only when they specify a backlog (eg: some implementations like LDK always specify 2 weeks) is there extra work to be done.

Filtration is also ongoing once applied (every message we send to the peer goes through the filter).

I'm not sure we should cancel backlog delivery all together if they send a new filter while we're still sending out the backlog. As mentioned above, consider nodes like LDK that always send a filter 2 weeks old. Assuming all nodes in the network do a keep alive update for their channels every 2 weeks, that means we end up sending a channel update for pretty much every channel in the network. With the bandwidth rate limiting, this may take several seconds to send out.

If we stop delivering that backlog when they add a new filter (eg: they request to only receive new updates after a timestamp, or that they don't want ongoing gossip), then they'll never get that backlog. Before we make a change like this, we should check the behavior of other implementations to make sure we aren't violating any assumptions re processing that they may hold.

I'm open to exploring such a change, but it should be done in a different PR imo. My aim here is to fix a bug we've seen in the wild, while maintaining our current behavior.

@Roasbeef
Copy link
Member Author

I'm not sure whether it makes sense to reduce filterSema to 1, as this essentially allows one peer to delay all other peers from syncing gossip.

This was a rebase mistake 😅. The goal is to keep the current defaults.

@Roasbeef Roasbeef force-pushed the gossip-block-fix branch 2 times, most recently from dccbd45 to 015b482 Compare July 25, 2025 00:25
@Roasbeef
Copy link
Member Author

I've pushed up a new commit that attempts to split the difference, with minimal code surface area shifts.

We add a new atomic variable, and use that to ensure that we only have a single goroutine delivering a backlog active for a given peer at a time. So if we're already delivering a backlog, and the send a timestamp that needs another backlog, we'll just drop that attempt. Only once the original backlog has finished being delivered do we let them request more.

This doesn't affect them applying a filter that doesn't require a backlog, that's still non-blocking as normal.

This is a clear improvement as it resolves the issue where a single peer could monopolize all 5 filter sema slots, but retains the existing behavior instead of preempting a backlog delivery goroutine.

@Roasbeef Roasbeef removed this from the v0.20.0 milestone Jul 25, 2025
@morehouse
Copy link
Collaborator

AFAICT LDK only sends gossip_timestamp_filter once when connecting to a peer. After that it doesn't send it again until a disconnect occurs (cc @TheBlueMatt to confirm). So regardless of whether we replace the existing filter or drop the new one, I don't think either will affect LDK.

Also note that other implementations already replace gossip filters immediately (CLN, eclair), as described in the spec.

The simplest approach is to just drop all filters that come in after the first one, regardless of whether a historical sync is going on or not. LDK already does this, so probably it would work just fine. But if we want to go that way, we should really consider changing the spec to match.

@TheBlueMatt
Copy link

I'm kinda confused why this needs a queue to begin with? If a peer asks for a historical gossip sync, you can just check in regularly and send it a few more messages whenever the TCP socket has room to send, it should require minimal additional state (literally we just have that one enum https://github.com/lightningdevkit/rust-lightning/blob/5ceb625f005269f9655d10e4ce2a6b3f951f8e09/lightning/src/ln/peer_handler.rs#L692-L704)

@morehouse
Copy link
Collaborator

morehouse commented Jul 25, 2025

I'm kinda confused why this needs a queue to begin with? If a peer asks for a historical gossip sync, you can just check in regularly and send it a few more messages whenever the TCP socket has room to send, it should require minimal additional state (literally we just have that one enum https://github.com/lightningdevkit/rust-lightning/blob/5ceb625f005269f9655d10e4ce2a6b3f951f8e09/lightning/src/ln/peer_handler.rs#L692-L704)

+1

Ideally LND wouldn't load all the requested gossip at once and queue it to be sent immediately. No other implementation does this because it's such a clear DoS risk.

I mentioned this repeatedly during the 2 years of correspondence about the gossip_timestamp_filter DoS vector (in at least 3 different emails), but no one ever acknowledged this point or explained why it wasn't feasible to implement properly.

I'm guessing it's just more re-engineering than anyone wanted to do. Much easier to keep applying these bandaids to the problem.

@Roasbeef
Copy link
Member Author

Roasbeef commented Jul 25, 2025

I'm kinda confused why this needs a queue to begin with?

There's no queue. It's just a non-blocking send now. It was blocking before which was the root cause of the issue.

it should require minimal additional state (literally we just have that one enum

Yep, we have something very similar:

lnd/discovery/syncer.go

Lines 22 to 55 in 2e36f9b

// SyncerType encapsulates the different types of syncing mechanisms for a
// gossip syncer.
type SyncerType uint8
const (
// ActiveSync denotes that a gossip syncer:
//
// 1. Should not attempt to synchronize with the remote peer for
// missing channels.
// 2. Should respond to queries from the remote peer.
// 3. Should receive new updates from the remote peer.
//
// They are started in a chansSynced state in order to accomplish their
// responsibilities above.
ActiveSync SyncerType = iota
// PassiveSync denotes that a gossip syncer:
//
// 1. Should not attempt to synchronize with the remote peer for
// missing channels.
// 2. Should respond to queries from the remote peer.
// 3. Should not receive new updates from the remote peer.
//
// They are started in a chansSynced state in order to accomplish their
// responsibilities above.
PassiveSync
// PinnedSync denotes an ActiveSync that doesn't count towards the
// default active syncer limits and is always active throughout the
// duration of the peer's connection. Each pinned syncer will begin by
// performing a historical sync to ensure we are well synchronized with
// their routing table.
PinnedSync
)

@Roasbeef
Copy link
Member Author

Roasbeef commented Jul 25, 2025

Also note that other implementations already replace gossip filters immediately (CLN, eclair), as described in the spec.

Yes, we still do this, that behavior is unchanged.

@Roasbeef
Copy link
Member Author

Roasbeef commented Jul 25, 2025

The simplest approach is to just drop all filters that come in after the first one, regardless of whether a historical sync is going on or not

That would break our usage. We re-apply filters in order to rotate peers that we're receiving active gossip from.

I'm guessing it's just more re-engineering than anyone wanted to do. Much easier to keep applying these bandaids to the problem.

We acknowledged that, which is why we added a config option to just never deliver a backlog. It has existed for more than 5 years now.

This isn't a very far gap, pagination is just a hop and a skip away. The new SQL db backend also makes this much easier. If you want to investigate drafting up such a change, I'd be happy to give feedback and review.

The actual solution here is to get rid of all these queries/filtering. Some Blockstream devs looked into doing reconciliation stuff repeatedly over the past few years, but seemingly no one cared enough to get the baton across the finish line. If no one has picked up the baton after we finish with gossip v2, then we can take a looksie.

@Roasbeef Roasbeef changed the title multi: add async processing queue for gossip timestamp filter multi: make gossip filter sends non-blocking, only allow a single backlog catch up goroutine per peer Jul 25, 2025
@morehouse
Copy link
Collaborator

Yes, we still do this, that behavior is unchanged.

Not quite. With this change, LND still continues to send the historical gossip for the previous filter. CLN and eclair stop that immediately AFAICT.

That would break our usage. We re-apply filters in order to rotate peers that we're receiving active gossip from.

Got it. If LND is the only significant user of the multiple filters, maybe the decision here isn't that important in practice. Still seems nice to follow the spec though.

We acknowledged that, which is why we added a config option to just never deliver a backlog. It has existed for more than 5 years now.

That's another bandaid.

This isn't a very far gap, pagination is just a hop and a skip away. The new SQL db backend also makes this much easier. If you want to investigate drafting up such a change, I'd be happy to give feedback and review.

If the problem was known, and the fix is just a hop and a skip away, I don't really understand why this hasn't been fixed for 5 years, and why no one responded to my recommendation to fix it for 2 years.

The actual solution here is to get rid of all these queries/filtering. Some Blockstream devs looked into doing reconciliation stuff repeatedly over the past few years, but seemingly no one cared enough to get the baton across the finish line. If no one has picked up the baton after we finish with gossip v2, then we can take a looksie.

This would be super cool. We still wouldn't want to load the entire set difference into memory and send it all at once though.

@TheBlueMatt
Copy link

The actual solution here is to get rid of all these queries/filtering. Some Blockstream devs looked into doing reconciliation stuff repeatedly over the past few years, but seemingly no one cared enough to get the baton across the finish line. If no one has picked up the baton after we finish with gossip v2, then we can take a looksie.

I believe the conclusion was "this works better if we have block-height-based gossip rather than time-based gossip, so we'll do it after v2", so it is indeed waiting on gossip v2.

Copy link
Collaborator

@ellemouton ellemouton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm 👍

@Roasbeef
Copy link
Member Author

Still seems nice to follow the spec though.

My read of the spec is that we're compliant here, there is indeed some room for interpretation here.

eg:

Note that this filter replaces any previous one, so it can be used multiple times to change the gossip from a peer.

We do this. We immediately apply the filter, once received. After it's been applied, any requests to trickle out gossip traffic will use the latest filter. lnd does indeed utilize this clause, as we'll apply a filter many times (just to turn traffic on/off) over the lifetime of a connection.

Re the backlog section:

SHOULD send all gossip messages whose timestamp is greater or equal to first_timestamp, and less than first_timestamp plus timestamp_range.
MAY wait for the next outgoing gossip flush to send these.

There's nothing here that prescribes halting backlog delivery once a new filter has been applied, or preempting delivery to prioritize a newer filter. So there's certainly room to tighten up the wording here to guide in a desired direction.

@Roasbeef
Copy link
Member Author

Not sure what's up w/ the linter, I get zero diff locally if I run gofmt on the flagged file.

In this commit, we make the gossip filter semaphore capacity configurable
through a new FilterConcurrency field. This change allows node operators
to tune the number of concurrent gossip filter applications based on
their node's resources and network position.

The previous hard-coded limit of 5 concurrent filter applications could
become a bottleneck when multiple peers attempt to synchronize
simultaneously. By making this value configurable via the new
gossip.filter-concurrency option, operators can increase this limit
for better performance on well-resourced nodes or maintain conservative
values on resource-constrained systems.

We keep the default value at 5 to maintain backward compatibility and
avoid unexpected resource usage increases for existing deployments. The
sample configuration file is updated to document this new option.
In this commit, we introduce an asynchronous processing queue for
GossipTimestampRange messages in the GossipSyncer. This change addresses
a critical issue where the gossiper could block indefinitely when
processing timestamp range messages during periods of high load.

Previously, when a peer sent a GossipTimestampRange message, the
gossiper would synchronously call ApplyGossipFilter, which could block
on semaphore acquisition, database queries, and rate limiting. This
synchronous processing created a bottleneck where the entire peer
message processing pipeline would stall, potentially causing timeouts
and disconnections.

The new design adds a timestampRangeQueue channel with a capacity of 1
message and a dedicated goroutine for processing these messages
asynchronously. This follows the established pattern used for other
message types in the syncer. When the queue is full, we drop messages
and log a warning rather than blocking indefinitely, providing graceful
degradation under extreme load conditions.
In this commit, we complete the integration of the asynchronous
timestamp range queue by modifying ProcessRemoteAnnouncement to use
the new queuing mechanism instead of calling ApplyGossipFilter
synchronously.

This change ensures that when a peer sends a GossipTimestampRange
message, it is queued for asynchronous processing rather than
blocking the gossiper's main message processing loop. The modification
prevents the peer's readHandler from blocking on potentially slow
gossip filter operations, maintaining connection stability during
periods of high synchronization activity.

If the queue is full when attempting to enqueue a message, we log
a warning but return success to prevent peer disconnection. This
design choice prioritizes connection stability over guaranteed
delivery of every gossip filter request, which is acceptable since
peers can always resend timestamp range messages if needed.
In this commit, we complete the integration of the configurable gossip
filter concurrency by wiring the new FilterConcurrency configuration
through all layers of the application.

The changes connect the gossip.filter-concurrency configuration option
from the command-line interface through the server initialization code
to the gossiper and sync manager. This ensures that operators can
actually use the new configuration option to tune their node's
concurrent gossip filter processing capacity based on their specific
requirements and available resources.
In this commit, we add detailed documentation to help node operators
understand and configure the gossip rate limiting system effectively.
The new guide addresses a critical knowledge gap that has led to
misconfigured nodes experiencing synchronization failures.

The documentation covers the token bucket algorithm used for rate
limiting, providing clear formulas and examples for calculating
appropriate values based on node size and network position. We include
specific recommendations ranging from 50 KB/s for small nodes to
1 MB/s for large routing nodes, with detailed calculations showing
how these values are derived.

The guide explains the relationship between rate limiting and other
configuration options like num-restricted-slots and the new
filter-concurrency setting. We provide troubleshooting steps for
common issues like slow initial sync and peer disconnections, along
with debug commands and log patterns to identify problems.

Configuration examples are provided for conservative, balanced, and
performance-oriented setups, giving operators concrete starting points
they can adapt to their specific needs. The documentation emphasizes
the importance of not setting rate limits too low, warning that values
below 50 KB/s can cause synchronization to fail entirely.
In this commit, we add a new atomic bool to only permit a single gossip
backlog goroutine per peer. If we get a new reuqest that needs a backlog
while we're still processing the other, then we'll drop that request.
Copy link
Member

@yyforyongyu yyforyongyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will address the nits in #10102.

@@ -207,8 +212,13 @@ type SyncManager struct {
// newSyncManager constructs a new SyncManager backed by the given config.
func newSyncManager(cfg *SyncManagerCfg) *SyncManager {

filterSema := make(chan struct{}, filterSemaSize)
for i := 0; i < filterSemaSize; i++ {
filterConcurrency := cfg.FilterConcurrency
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: newline above

}

// Parse the pubkeys for the pinned syncers.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: newline can be removed

@yyforyongyu yyforyongyu merged commit 6c39b9a into lightningnetwork:master Aug 4, 2025
36 of 39 checks passed
@github-project-automation github-project-automation bot moved this from In progress to Done in lnd v0.20 Aug 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

6 participants