Skip to content

Conversation

@mhl-b
Copy link
Contributor

@mhl-b mhl-b commented Nov 14, 2025

A high level overview of networking for the distributed architecture guide. ES-7886

@elasticsearchmachine elasticsearchmachine added needs:triage Requires assignment of a team area label v9.3.0 labels Nov 14, 2025
@mhl-b mhl-b added Team:Distributed Coordination Meta label for Distributed Coordination team >non-issue :Distributed Coordination/Distributed A catch all label for anything in the Distributed Coordination area. Please avoid if you can. and removed needs:triage Requires assignment of a team area label labels Nov 14, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination)

@mhl-b mhl-b requested review from DaveCTurner and ywangd November 14, 2025 18:25
Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a few factual nits, nothing major. It needs some editorial polishing too - can we use any of our fancy new AI tools to help with that?

Comment on lines +26 to +27
be HTTP spec compliant, Elastic is not a webserver. We support GET requests with
payload (some old proxies might drop content), requests cannot be cached by
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe mention that in these cases we also support the same API with a different verb, normally POST, for clients who cannot send a GET-with-body.


HTTP transport provides two options for content processing: aggregate fully and
incremental. Aggregated content is a preferable choice for a small messages that
cannot be parsed incrementally (like JSON). But aggregation has drawbacks, it
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cannot be parsed incrementally (like JSON)

Mmm sorta depends what you mean by "incrementally" but I'd say that the SAX-style parsing we do is in fact working incrementally. It doesn't really make sense to start parsing before we've received the whole body, but that's different from saying "cannot".

Comment on lines +78 to +79
The job of the REST handler is to parse and validate HTTP request and construct
typed version of request, often Transport request (see Transport section below).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth pointing out here that (if security is enabled then) authentication happens before the REST handler but authorization happens after (when entering the transport layer).


`Transport` is an umbrella term for a node-to-node communication. It's a
TCP-based custom binary protocol. Every node in a cluster is a client and server
at the same time. Node-to-node communication never uses HTTP transport.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

never

except for reindex-from-remote :)

connections for different purposes: ping, node-state, bulks, etc. Pool structure
is defined in `ConnectionProfile` class.

ES has resilience to disconnects, frequent reconnects, but in general we assume
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resilience

I don't think we are particularly resilient to frequent reconnects. At least, not without being more specific. ES never behaves incorrectly (e.g. loses data) in the face of network outages but it may become unavailable unless the network is stable.


Another area of different networking clients is snapshotting. ES supports
snapshotting to remote repositories. These repositories usually come with their
own SDK and networking stack. For example AWS SDK comes with tomcat or netty,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although the Apache client we use may have originally come from Tomcat, it's not called that any more and this'll confuse folks that don't know the history. Let's just call it the Apache client.

be forked to another thread pool. But forking comes with overhead, doing forking
on every tiny request is a wasted CPU work. As a rule of thumb: don't fork
simple requests that can be served from memory and do not require heavy
computations (seconds), otherwise fork.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A pause of "seconds" is still pretty disastrous

Suggested change
computations (seconds), otherwise fork.
computations (millseconds), otherwise fork.

One of the performance edges of netty is controlled memory allocation. Netty
manages byte buffer pools and reuse them heavily. This performance gain comes
with a cost. And cost is reference counting on developer's shoulders. Netty
reads socket bytes into pooled byte-buffers and passes them to application. Then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth saying a bit more about how this looks in a heap dump - there's a pool of 1MiB byte[] objects which Netty slices into 16kiB pages, not all of which may be in use, so you have to account for this when investigating memory usage.

One of the performance edges of netty is controlled memory allocation. Netty
manages byte buffer pools and reuse them heavily. This performance gain comes
with a cost. And cost is reference counting on developer's shoulders. Netty
reads socket bytes into pooled byte-buffers and passes them to application. Then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also Netty technically reads the bytes into a direct buffer and then CopyBytesSocketChannel copies them into the pooled buffer.

Another area of different networking clients is snapshotting. ES supports
snapshotting to remote repositories. These repositories usually come with their
own SDK and networking stack. For example AWS SDK comes with tomcat or netty,
Azure with netty-based project-reactor, GCP uses default java HTTP
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Netty needs a capital N (here and in several other places)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Distributed Coordination/Distributed A catch all label for anything in the Distributed Coordination area. Please avoid if you can. >non-issue Team:Distributed Coordination Meta label for Distributed Coordination team v9.3.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants