Refreshing the topology is a resource-intensive operation, especially in large clusters

## Feature Request



#### Is your feature request related to a problem? Please describe

We discovered that enabling DynamicRefreshSources in large clusters causes significant performance overhead during topology refreshes. Testing in a cluster with 200 nodes showed that the topology refresh process alone consumes **_1–2 CPU_ cores**. We profiled the topology refresh using a flame graph and found that updateCache accounts for **_75%_** of the total cost (with 200 nodes), and this proportion continues to grow as the cluster scales—reaching up to **_95%_** at 1,000 nodes.

#### Describe the solution you'd like

Upon analysis, we found that the time complexity of the updateCache operation is O(N² * 16384) during topology refresh. This step is mainly responsible for transforming the topology from a node → slot perspective into a slot → node perspective, which is used by Lettuce for routing read traffic. However, this transformed view is not required for the KnownMajority-based topology selection logic.

Therefore, we propose an optimization: first perform the topology selection, and then run updateCache only on the selected optimal view. This would reduce the time complexity of updateCache from O(N² * 16384) to a constant level of O(16384) , significantly reducing the performance overhead of the topology refresh process.


#### Some flame in 200 nodes
 

- Flame graph source file 
 [profile.tar.gz](https://github.com/user-attachments/files/20877584/profile.tar.gz)

- Vanilla (updateCache is the purple part in the picture)
![Image](https://github.com/user-attachments/assets/11a1dc39-e8fb-44ef-9805-27f4ce375c36)

- After optimization
![Image](https://github.com/user-attachments/assets/9f9f8bc4-9d80-4d99-b526-e97eab50b2c4)




#### Futher More

GC during the large cluster topology refresh process is still a problem, and memory pools can be considered in the future

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Refreshing the topology is a resource-intensive operation, especially in large clusters #3331

Feature Request

Is your feature request related to a problem? Please describe

Describe the solution you'd like

Some flame in 200 nodes

Futher More

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Refreshing the topology is a resource-intensive operation, especially in large clusters #3331

Description

Feature Request

Is your feature request related to a problem? Please describe

Describe the solution you'd like

Some flame in 200 nodes

Futher More

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions