multi: make gossip filter sends non-blocking, only allow a single backlog catch up goroutine per peer #10097

Roasbeef · 2025-07-21T17:57:02Z

In this PR, we add an async processing queue for the gossip timestamp filter. This ensures that apply a filter never blocks another goroutine. If the queue is full, then the message is just dropped.

We also make the filtering concurrency configurable.

Finally, we add a new set of docs to explain how the various rate limiting config options work in tandem.

gemini-code-assist

Summary of Changes

Hello @Roasbeef, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the robustness and configurability of the gossip subsystem by introducing an asynchronous processing queue for timestamp range filters. This change prevents potential blocking issues and improves overall responsiveness under heavy load. Additionally, it provides users with fine-grained control over filter concurrency and includes extensive documentation to guide optimal configuration of the gossip rate limiting parameters.

Highlights

Asynchronous Gossip Filter Processing: Implemented an asynchronous processing queue for GossipTimestampRange messages. This change ensures that applying a gossip filter never blocks other goroutines, improving responsiveness. If the queue is full, messages are dropped to prevent backpressure.
Configurable Gossip Filter Concurrency: Introduced a new configuration option, gossip.filter-concurrency, allowing users to specify the maximum number of concurrent gossip filter applications. This provides greater control over resource utilization and performance.
Comprehensive Gossip Rate Limiting Documentation: Added a new documentation file (docs/gossip_rate_limiting.md) that thoroughly explains the various gossip rate limiting configuration options (msg-rate-bytes, msg-burst-bytes, filter-concurrency, num-restricted-slots), including guidance on calculating appropriate values, troubleshooting common issues, and best practices.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments or fill out our survey to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces an asynchronous processing queue for gossip timestamp filter messages. This is a great improvement to prevent blocking the main gossiper goroutine, especially when rate limits are active. The changes are well-structured, and the addition of comprehensive tests for the new queueing logic in discovery/syncer_queue_test.go is excellent. The new documentation in docs/gossip_rate_limiting.md is also very clear and helpful for users. The style guide violations found were related to select statements missing comments for each case, and line lengths exceeding 80 characters in the documentation. Additionally, there was a discrepancy in the default value for FilterConcurrency.

discovery/syncer.go

lncfg/gossip.go

docs/gossip_rate_limiting.md

yyforyongyu

Should we do a 0.19.3 to include this fix? In addition I have a design question - we introduced rate limiter so we can slow down without dropping events, yet we are dropping queries here, do we need to stick to one pattern instead? Maybe a mixed approach works here, plus the dropping part really makes things easier. Also should we also drop queries like QueryShortChanIDs, or we only care about timestamp queries since it's the only trouble maker?

discovery/sync_manager.go

discovery/syncer.go

docs/gossip_rate_limiting.md

discovery/gossiper.go

docs/gossip_rate_limiting.md

Roasbeef · 2025-07-22T15:55:03Z

Should we do a 0.19.3 to include this fix? In addition I have a design question - we introduced rate limiter so we can slow down without dropping events, yet we are dropping queries here, do we need to stick to one pattern instead?

The goal wasn't necessarily to slow down without dropping events. Just to rate limit how much we'll send out to avoid runaway bandwidth utilization. What we're dropping here is the GossipQueryTimestamp message. Peers should only really send this once upon connection. There's no good reason for a peer to send this 10 times over the course of a connection. We even have a flag to just ignore these queries all together as it's an easy way to get a peer to do a lot of work for you repeatedly.

In this PR, we use a buffer of 10 which is maybe even too large, I think we can safely halve that. The reason it's important that we have a purely async send here (never blocks, drops if can't send) is that if we block here, it'll also block the readHandler.

Re dropping, I have this PR, which implements Random Early Dropping for the message queue. This ensures that the queues never get full (congested). We only drop non-essential messages like gossip messages. We'll never drop messages like channel updates, etc. Ultimately, gossip is secondary to link level channel operations. Hence my suggestion to move to something like QUIC (and/or distinct TCP connections) so concerns can be entirely separated.

morehouse · 2025-07-22T22:43:06Z

Ideally, any time we get a new filter, shouldn't we stop processing the old one, and switch to the new one? It seems silly/antisocial to keep sending gossip the peer doesn't want anymore.

So no need for a queue at all?

Roasbeef · 2025-07-22T22:56:06Z

Ideally, any time we get a new filter, shouldn't we stop processing the old one, and switch to the new one? It seems silly/antisocial to keep sending gossip the peer doesn't want anymore.

From the same peer? In the latest push, I've just reduced the size of the buffered channel to one.

Re stop processing the old one: are you suggesting that we reduce the filter sema size to 1? As is, for a given peer, we'll only ever process one filter at a time. As an example, lnd will set a nil filter to request to gossip, but then at a later time, as we rotate through gossip peers, we'll set a non-zero, but near term filter so we get a trickle.

morehouse · 2025-07-22T23:04:10Z

From the same peer? In the latest push, I've just reduced the size of the buffered channel to one.

Right, the same peer.

Re stop processing the old one: are you suggesting that we reduce the filter sema size to 1? As is, for a given peer, we'll only ever process one filter at a time. As an example, lnd will set a nil filter to request to gossip, but then at a later time, as we rotate through gossip peers, we'll set a non-zero, but near term filter so we get a trickle.

I'm suggesting that we early-stop the goroutine that's sending gossip associated with the previous filter, and then only send gossip that matches the new filter. No need to buffer gossip_timestamp_filter messages.

I think we would still need the global semaphore to limit our memory consumption, unless we're going to start limiting how many messages we load from the DB at a time.

ellemouton

looks good! only thing is that the default values seem incorrect

lncfg/gossip.go

discovery/sync_manager.go

sample-lnd.conf

discovery/syncer.go

yyforyongyu

I'm suggesting that we early-stop the goroutine that's sending gossip associated with the previous filter, and then only send gossip that matches the new filter. No need to buffer gossip_timestamp_filter messages.

Yeah it would be nice to stop the previous one and only process the new one, tho I think it may create another issue that the misbehaving node keeps sending new ones while we are processing the old one. Overall I view gossip reply as a free service, and we are delivering use best-effort. Imo the other shouldn't query that many and often.

discovery/sync_manager.go

discovery/syncer.go

morehouse · 2025-07-23T11:43:49Z

Here's some more details about why and how we could implement this.

Context

gossip_timestamp_filter, as the name implies, is primarily a mechanism to filter the kind of gossip that your peers send you. This is mutually beneficial for the nodes involved -- you only have to process gossip that you want to know about, while the peers don't waste resources sending you gossip you will ignore anyway.

This means that it only ever makes sense to have a single filter active at a time. When a peer sends a new filter, it should immediately replace the previous filter, regardless of whether all gossip matching the previous filter has been sent. Essentially any gossip sent according to the previous filter can be considered a complete waste of resources. If we don't stop servicing the previous filter immediately, and if there's any overlap in filters, we will end up sending the same gossip multiple times, thereby aggravating the outgoing bandwidth issues we're currently seeing.

gossip_timestamp_filter is an inherent resource DoS risk, since it is asymmetrical. A small (< 100 B) gossip_timestamp_filter message can cause us to send many MBs of gossip. So we should do what we can to avoid wasted resources when applying these filters.

This PR

The current design does 2 main things:

reduces the default global filterSema from 5 to 1.
drops new filters if we haven't sent all the gossip associated with the first filter yet

Reducing `filterSema`

I'm not sure whether it makes sense to reduce filterSema to 1, as this essentially allows one peer to delay all other peers from syncing gossip. Suppose one peer requests 10 MB of gossip (say ~100k individual messages). At the current default rate limit of 50 KB/s, it will take a bare minimum of 3 minutes to service that request. Additionally, a slow peer will significantly stall the sync. For example, if it takes the peer 1 second to process each message, it will take 100k seconds (~27 hours) to finish the gossip sync, meanwhile no other nodes can sync gossip.

Of course a filterSema of 5 doesn't prevent an attacker from doing this on purpose either. But it does keep a slow but honest peer from impacting other peers in this way.

Dropping new filters

As explained above, this doesn't make sense and wastes our own resources. We should drop the old filter instead.

Proposed changes

I think we can simplify the design slightly, while fixing the filter handling.

Smaller, separate PRs

The filterSema change seems orthogonal and belongs in a separate PR, with a clear justification for the change. Likewise, the new doc can go in a separate PR.

Separating these out will make it easier to focus review on the main goal of this PR -- improving handling of gossip filters.

Assimilating `QueueTimestampRange` behavior into `ApplyGossipFilter`

Let's make ApplyGossipFilter do what the QueueTimestampRange function currently does -- apply the gossip filter without blocking (as the name would imply). Let's also remove the (currently unused) config option to set the queue size. Queue sizes other than 1 don't make sense here anyway.

ApplyGossipFilter sends the filter message to a dedicated "historical sync" goroutine.

The "historical sync" goroutine does the following in a loop:

Reads the new filter (blocking)
Acquires the filterSema
Loads the requested gossip messages
Sends the requested gossip messages one-by-one in a loop
- On each loop iteration, checks if a new gossip filter has been received (non-blocking).
  - If so, releases filterSema and goes back to step 2.
  - If not, sends the next gossip message

Use a simple "overwritable" queue

Instead of using an actual chan to send gossip filters to the historical sync goroutine, we want a data structure that buffers one filter and overwrites that filter when a new one is sent. Something like this:

type LatestFilter struct {
	mu sync.Mutex
	cond *sync.Cond
	filter *GossipTimestampRange
	newFilter bool
}

func (l *LatestFilter) Set(f *GossipTimestampRange) {
	l.mu.Lock()
	defer l.mu.Unlock()

	l.filter = f
	l.newFilter = true
	l.cond.Signal()
}

func (l *LatestFilter) TryGet() *GossipTimestampRange {
	l.mu.Lock()
	defer l.mu.Unlock()

	if !l.newFilter {
		return nil
	}
	l.newFilter = false
	return l.filter
}

func (l *LatestFilter) Get() *GossipTimestampRange {
	l.mu.Lock()
	defer l.mu.Unlock()

	for !l.newFilter {
		l.cond.Wait()
	}
	l.newFilter = false
	return l.filter
}

Roasbeef · 2025-07-24T23:17:15Z

I'm suggesting that we early-stop the goroutine that's sending gossip associated with the previous filter, and then only send gossip that matches the new filter. No need to buffer gossip_timestamp_filter messages.

With what's in this PR now, there's no effective buffer. We just make sure that the sends are always non-blocking. If they send another filter before we process the prior one, it gets dropped. If they send a filter while we're processing one, it gets buffered for later.

There're two cases, when they set a backlog vs when they don't. If they aren't specifying a timestamp in the past for a backlog, then there's no extra work to be done. Only when they specify a backlog (eg: some implementations like LDK always specify 2 weeks) is there extra work to be done.

Filtration is also ongoing once applied (every message we send to the peer goes through the filter).

I'm not sure we should cancel backlog delivery all together if they send a new filter while we're still sending out the backlog. As mentioned above, consider nodes like LDK that always send a filter 2 weeks old. Assuming all nodes in the network do a keep alive update for their channels every 2 weeks, that means we end up sending a channel update for pretty much every channel in the network. With the bandwidth rate limiting, this may take several seconds to send out.

If we stop delivering that backlog when they add a new filter (eg: they request to only receive new updates after a timestamp, or that they don't want ongoing gossip), then they'll never get that backlog. Before we make a change like this, we should check the behavior of other implementations to make sure we aren't violating any assumptions re processing that they may hold.

I'm open to exploring such a change, but it should be done in a different PR imo. My aim here is to fix a bug we've seen in the wild, while maintaining our current behavior.

Roasbeef · 2025-07-24T23:29:17Z

I'm not sure whether it makes sense to reduce filterSema to 1, as this essentially allows one peer to delay all other peers from syncing gossip.

This was a rebase mistake 😅. The goal is to keep the current defaults.

Roasbeef · 2025-07-25T00:30:21Z

I've pushed up a new commit that attempts to split the difference, with minimal code surface area shifts.

We add a new atomic variable, and use that to ensure that we only have a single goroutine delivering a backlog active for a given peer at a time. So if we're already delivering a backlog, and the send a timestamp that needs another backlog, we'll just drop that attempt. Only once the original backlog has finished being delivered do we let them request more.

This doesn't affect them applying a filter that doesn't require a backlog, that's still non-blocking as normal.

This is a clear improvement as it resolves the issue where a single peer could monopolize all 5 filter sema slots, but retains the existing behavior instead of preempting a backlog delivery goroutine.

morehouse · 2025-07-25T16:43:27Z

AFAICT LDK only sends gossip_timestamp_filter once when connecting to a peer. After that it doesn't send it again until a disconnect occurs (cc @TheBlueMatt to confirm). So regardless of whether we replace the existing filter or drop the new one, I don't think either will affect LDK.

Also note that other implementations already replace gossip filters immediately (CLN, eclair), as described in the spec.

The simplest approach is to just drop all filters that come in after the first one, regardless of whether a historical sync is going on or not. LDK already does this, so probably it would work just fine. But if we want to go that way, we should really consider changing the spec to match.

TheBlueMatt · 2025-07-25T17:38:46Z

I'm kinda confused why this needs a queue to begin with? If a peer asks for a historical gossip sync, you can just check in regularly and send it a few more messages whenever the TCP socket has room to send, it should require minimal additional state (literally we just have that one enum https://github.com/lightningdevkit/rust-lightning/blob/5ceb625f005269f9655d10e4ce2a6b3f951f8e09/lightning/src/ln/peer_handler.rs#L692-L704)

morehouse · 2025-07-25T17:56:28Z

I'm kinda confused why this needs a queue to begin with? If a peer asks for a historical gossip sync, you can just check in regularly and send it a few more messages whenever the TCP socket has room to send, it should require minimal additional state (literally we just have that one enum https://github.com/lightningdevkit/rust-lightning/blob/5ceb625f005269f9655d10e4ce2a6b3f951f8e09/lightning/src/ln/peer_handler.rs#L692-L704)

+1

Ideally LND wouldn't load all the requested gossip at once and queue it to be sent immediately. No other implementation does this because it's such a clear DoS risk.

I mentioned this repeatedly during the 2 years of correspondence about the gossip_timestamp_filter DoS vector (in at least 3 different emails), but no one ever acknowledged this point or explained why it wasn't feasible to implement properly.

I'm guessing it's just more re-engineering than anyone wanted to do. Much easier to keep applying these bandaids to the problem.

Roasbeef · 2025-07-25T20:34:44Z

I'm kinda confused why this needs a queue to begin with?

There's no queue. It's just a non-blocking send now. It was blocking before which was the root cause of the issue.

it should require minimal additional state (literally we just have that one enum

Yep, we have something very similar:

lnd/discovery/syncer.go

Lines 22 to 55 in 2e36f9b

    
           // SyncerType encapsulates the different types of syncing mechanisms for a 
        
           // gossip syncer. 
        
           type SyncerType uint8 
        
           const ( 
        
           	// ActiveSync denotes that a gossip syncer: 
        
           	// 
        
           	// 1. Should not attempt to synchronize with the remote peer for 
        
           	//    missing channels. 
        
           	// 2. Should respond to queries from the remote peer. 
        
           	// 3. Should receive new updates from the remote peer. 
        
           	// 
        
           	// They are started in a chansSynced state in order to accomplish their 
        
           	// responsibilities above. 
        
           	ActiveSync SyncerType = iota 
        
           	// PassiveSync denotes that a gossip syncer: 
        
           	// 
        
           	// 1. Should not attempt to synchronize with the remote peer for 
        
           	//    missing channels. 
        
           	// 2. Should respond to queries from the remote peer. 
        
           	// 3. Should not receive new updates from the remote peer. 
        
           	// 
        
           	// They are started in a chansSynced state in order to accomplish their 
        
           	// responsibilities above. 
        
           	PassiveSync 
        
           	// PinnedSync denotes an ActiveSync that doesn't count towards the 
        
           	// default active syncer limits and is always active throughout the 
        
           	// duration of the peer's connection. Each pinned syncer will begin by 
        
           	// performing a historical sync to ensure we are well synchronized with 
        
           	// their routing table. 
        
           	PinnedSync 
        
           )

Roasbeef · 2025-07-25T20:35:05Z

Also note that other implementations already replace gossip filters immediately (CLN, eclair), as described in the spec.

Yes, we still do this, that behavior is unchanged.

Roasbeef · 2025-07-25T20:35:57Z

The simplest approach is to just drop all filters that come in after the first one, regardless of whether a historical sync is going on or not

That would break our usage. We re-apply filters in order to rotate peers that we're receiving active gossip from.

I'm guessing it's just more re-engineering than anyone wanted to do. Much easier to keep applying these bandaids to the problem.

We acknowledged that, which is why we added a config option to just never deliver a backlog. It has existed for more than 5 years now.

This isn't a very far gap, pagination is just a hop and a skip away. The new SQL db backend also makes this much easier. If you want to investigate drafting up such a change, I'd be happy to give feedback and review.

The actual solution here is to get rid of all these queries/filtering. Some Blockstream devs looked into doing reconciliation stuff repeatedly over the past few years, but seemingly no one cared enough to get the baton across the finish line. If no one has picked up the baton after we finish with gossip v2, then we can take a looksie.

morehouse · 2025-07-25T21:44:18Z

Yes, we still do this, that behavior is unchanged.

Not quite. With this change, LND still continues to send the historical gossip for the previous filter. CLN and eclair stop that immediately AFAICT.

That would break our usage. We re-apply filters in order to rotate peers that we're receiving active gossip from.

Got it. If LND is the only significant user of the multiple filters, maybe the decision here isn't that important in practice. Still seems nice to follow the spec though.

We acknowledged that, which is why we added a config option to just never deliver a backlog. It has existed for more than 5 years now.

That's another bandaid.

This isn't a very far gap, pagination is just a hop and a skip away. The new SQL db backend also makes this much easier. If you want to investigate drafting up such a change, I'd be happy to give feedback and review.

If the problem was known, and the fix is just a hop and a skip away, I don't really understand why this hasn't been fixed for 5 years, and why no one responded to my recommendation to fix it for 2 years.

The actual solution here is to get rid of all these queries/filtering. Some Blockstream devs looked into doing reconciliation stuff repeatedly over the past few years, but seemingly no one cared enough to get the baton across the finish line. If no one has picked up the baton after we finish with gossip v2, then we can take a looksie.

This would be super cool. We still wouldn't want to load the entire set difference into memory and send it all at once though.

TheBlueMatt · 2025-07-25T23:30:48Z

The actual solution here is to get rid of all these queries/filtering. Some Blockstream devs looked into doing reconciliation stuff repeatedly over the past few years, but seemingly no one cared enough to get the baton across the finish line. If no one has picked up the baton after we finish with gossip v2, then we can take a looksie.

I believe the conclusion was "this works better if we have block-height-based gossip rather than time-based gossip, so we'll do it after v2", so it is indeed waiting on gossip v2.

ellemouton

lgtm 👍

discovery/syncer.go

Roasbeef · 2025-07-31T17:42:03Z

Still seems nice to follow the spec though.

My read of the spec is that we're compliant here, there is indeed some room for interpretation here.

eg:

Note that this filter replaces any previous one, so it can be used multiple times to change the gossip from a peer.

We do this. We immediately apply the filter, once received. After it's been applied, any requests to trickle out gossip traffic will use the latest filter. lnd does indeed utilize this clause, as we'll apply a filter many times (just to turn traffic on/off) over the lifetime of a connection.

Re the backlog section:

SHOULD send all gossip messages whose timestamp is greater or equal to first_timestamp, and less than first_timestamp plus timestamp_range.
MAY wait for the next outgoing gossip flush to send these.

There's nothing here that prescribes halting backlog delivery once a new filter has been applied, or preempting delivery to prioritize a newer filter. So there's certainly room to tighten up the wording here to guide in a desired direction.

Roasbeef · 2025-07-31T17:42:41Z

Not sure what's up w/ the linter, I get zero diff locally if I run gofmt on the flagged file.

In this commit, we make the gossip filter semaphore capacity configurable through a new FilterConcurrency field. This change allows node operators to tune the number of concurrent gossip filter applications based on their node's resources and network position. The previous hard-coded limit of 5 concurrent filter applications could become a bottleneck when multiple peers attempt to synchronize simultaneously. By making this value configurable via the new gossip.filter-concurrency option, operators can increase this limit for better performance on well-resourced nodes or maintain conservative values on resource-constrained systems. We keep the default value at 5 to maintain backward compatibility and avoid unexpected resource usage increases for existing deployments. The sample configuration file is updated to document this new option.

In this commit, we introduce an asynchronous processing queue for GossipTimestampRange messages in the GossipSyncer. This change addresses a critical issue where the gossiper could block indefinitely when processing timestamp range messages during periods of high load. Previously, when a peer sent a GossipTimestampRange message, the gossiper would synchronously call ApplyGossipFilter, which could block on semaphore acquisition, database queries, and rate limiting. This synchronous processing created a bottleneck where the entire peer message processing pipeline would stall, potentially causing timeouts and disconnections. The new design adds a timestampRangeQueue channel with a capacity of 1 message and a dedicated goroutine for processing these messages asynchronously. This follows the established pattern used for other message types in the syncer. When the queue is full, we drop messages and log a warning rather than blocking indefinitely, providing graceful degradation under extreme load conditions.

In this commit, we complete the integration of the asynchronous timestamp range queue by modifying ProcessRemoteAnnouncement to use the new queuing mechanism instead of calling ApplyGossipFilter synchronously. This change ensures that when a peer sends a GossipTimestampRange message, it is queued for asynchronous processing rather than blocking the gossiper's main message processing loop. The modification prevents the peer's readHandler from blocking on potentially slow gossip filter operations, maintaining connection stability during periods of high synchronization activity. If the queue is full when attempting to enqueue a message, we log a warning but return success to prevent peer disconnection. This design choice prioritizes connection stability over guaranteed delivery of every gossip filter request, which is acceptable since peers can always resend timestamp range messages if needed.

In this commit, we complete the integration of the configurable gossip filter concurrency by wiring the new FilterConcurrency configuration through all layers of the application. The changes connect the gossip.filter-concurrency configuration option from the command-line interface through the server initialization code to the gossiper and sync manager. This ensures that operators can actually use the new configuration option to tune their node's concurrent gossip filter processing capacity based on their specific requirements and available resources.

In this commit, we add detailed documentation to help node operators understand and configure the gossip rate limiting system effectively. The new guide addresses a critical knowledge gap that has led to misconfigured nodes experiencing synchronization failures. The documentation covers the token bucket algorithm used for rate limiting, providing clear formulas and examples for calculating appropriate values based on node size and network position. We include specific recommendations ranging from 50 KB/s for small nodes to 1 MB/s for large routing nodes, with detailed calculations showing how these values are derived. The guide explains the relationship between rate limiting and other configuration options like num-restricted-slots and the new filter-concurrency setting. We provide troubleshooting steps for common issues like slow initial sync and peer disconnections, along with debug commands and log patterns to identify problems. Configuration examples are provided for conservative, balanced, and performance-oriented setups, giving operators concrete starting points they can adapt to their specific needs. The documentation emphasizes the importance of not setting rate limits too low, warning that values below 50 KB/s can cause synchronization to fail entirely.

In this commit, we add a new atomic bool to only permit a single gossip backlog goroutine per peer. If we get a new reuqest that needs a backlog while we're still processing the other, then we'll drop that request.

yyforyongyu

Will address the nits in #10102.

yyforyongyu · 2025-08-04T07:38:32Z

discovery/sync_manager.go

@@ -207,8 +212,13 @@ type SyncManager struct {
 // newSyncManager constructs a new SyncManager backed by the given config.
 func newSyncManager(cfg *SyncManagerCfg) *SyncManager {

-	filterSema := make(chan struct{}, filterSemaSize)
-	for i := 0; i < filterSemaSize; i++ {
+	filterConcurrency := cfg.FilterConcurrency


nit: newline above

yyforyongyu · 2025-08-04T07:39:17Z

lncfg/gossip.go

 }

 // Parse the pubkeys for the pinned syncers.
+


nit: newline can be removed

Roasbeef added bug fix gossip docs labels Jul 21, 2025

gemini-code-assist bot reviewed Jul 21, 2025

View reviewed changes

discovery/syncer.go Outdated Show resolved Hide resolved

lncfg/gossip.go Outdated Show resolved Hide resolved

docs/gossip_rate_limiting.md Show resolved Hide resolved

saubyk assigned Roasbeef Jul 22, 2025

saubyk added this to lnd v0.20 Jul 22, 2025

saubyk modified the milestones: v0.19.2, v0.20.0 Jul 22, 2025

saubyk moved this to In progress in lnd v0.20 Jul 22, 2025

yyforyongyu requested changes Jul 22, 2025

View reviewed changes

ellemouton self-requested a review July 22, 2025 10:58

saubyk changed the title ~~multi: add async processing queue for gossip timestamp fiiler~~ multi: add async processing queue for gossip timestamp fitler Jul 22, 2025

saubyk changed the title ~~multi: add async processing queue for gossip timestamp fitler~~ multi: add async processing queue for gossip timestamp filter Jul 22, 2025

Roasbeef force-pushed the gossip-block-fix branch from c9afe00 to 98e0519 Compare July 22, 2025 22:08

Roasbeef force-pushed the gossip-block-fix branch from 98e0519 to 112c525 Compare July 22, 2025 22:45

ellemouton reviewed Jul 23, 2025

View reviewed changes

lncfg/gossip.go Outdated Show resolved Hide resolved

discovery/sync_manager.go Show resolved Hide resolved

sample-lnd.conf Show resolved Hide resolved

discovery/syncer.go Outdated Show resolved Hide resolved

yyforyongyu reviewed Jul 23, 2025

View reviewed changes

discovery/sync_manager.go Show resolved Hide resolved

discovery/syncer.go Outdated Show resolved Hide resolved

discovery/syncer.go Show resolved Hide resolved

Roasbeef force-pushed the gossip-block-fix branch 2 times, most recently from dccbd45 to 015b482 Compare July 25, 2025 00:25

Roasbeef removed this from the v0.20.0 milestone Jul 25, 2025

yyforyongyu mentioned this pull request Jul 25, 2025

Increase the default outgoing bandwidth #10096

Merged

Roasbeef changed the title ~~multi: add async processing queue for gossip timestamp filter~~ multi: make gossip filter sends non-blocking, only allow a single backlog catch up goroutine per peer Jul 25, 2025

Roasbeef requested review from yyforyongyu and ellemouton July 25, 2025 20:40

Roasbeef force-pushed the gossip-block-fix branch from 015b482 to 9ca33d9 Compare July 25, 2025 21:01

ellemouton approved these changes Jul 29, 2025

View reviewed changes

yyforyongyu reviewed Jul 31, 2025

View reviewed changes

discovery/syncer.go Show resolved Hide resolved

Roasbeef added 8 commits August 1, 2025 11:20

discovery: add tests for for async timestamp range queue

5fcd33c

discovery: only permit a single gossip backlog goroutine per peer

7e767ea

In this commit, we add a new atomic bool to only permit a single gossip backlog goroutine per peer. If we get a new reuqest that needs a backlog while we're still processing the other, then we'll drop that request.

docs/release-notes: add release notes entry

8dcb7a8

Roasbeef force-pushed the gossip-block-fix branch from 9ca33d9 to 8dcb7a8 Compare August 1, 2025 16:20

yyforyongyu approved these changes Aug 4, 2025

View reviewed changes

yyforyongyu merged commit 6c39b9a into lightningnetwork:master Aug 4, 2025
36 of 39 checks passed

github-project-automation bot moved this from In progress to Done in lnd v0.20 Aug 4, 2025

guggero mentioned this pull request Aug 6, 2025

release: create v0.19.3-rc1 branch #10134

Merged

multi: make gossip filter sends non-blocking, only allow a single backlog catch up goroutine per peer #10097

multi: make gossip filter sends non-blocking, only allow a single backlog catch up goroutine per peer #10097

Uh oh!

Conversation

Roasbeef commented Jul 21, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yyforyongyu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Roasbeef commented Jul 22, 2025

Uh oh!

morehouse commented Jul 22, 2025

Uh oh!

Roasbeef commented Jul 22, 2025

Uh oh!

morehouse commented Jul 22, 2025

Uh oh!

ellemouton left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yyforyongyu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

morehouse commented Jul 23, 2025

Context

This PR

Reducing filterSema

Dropping new filters

Proposed changes

Smaller, separate PRs

Assimilating QueueTimestampRange behavior into ApplyGossipFilter

Use a simple "overwritable" queue

Uh oh!

Roasbeef commented Jul 24, 2025

Uh oh!

Roasbeef commented Jul 24, 2025

Uh oh!

Roasbeef commented Jul 25, 2025

Uh oh!

morehouse commented Jul 25, 2025

Uh oh!

TheBlueMatt commented Jul 25, 2025

Uh oh!

morehouse commented Jul 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Roasbeef commented Jul 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Roasbeef commented Jul 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Roasbeef commented Jul 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Reducing `filterSema`

Assimilating `QueueTimestampRange` behavior into `ApplyGossipFilter`

morehouse commented Jul 25, 2025 •

edited

Loading

Roasbeef commented Jul 25, 2025 •

edited

Loading

Roasbeef commented Jul 25, 2025 •

edited

Loading

Roasbeef commented Jul 25, 2025 •

edited

Loading