Skip to content

Conversation

@blathers-crl
Copy link

@blathers-crl blathers-crl bot commented Nov 24, 2025

Backport 1/1 commits from #157964 on behalf of @Abhinav1299.


Previously, when a single log message exceeded the configured max-buffer-size
for a buffered sink with exit-on-error enabled, the error would propagate up
and trigger process termination. This was overly aggressive for what amounts
to a logging configuration issue - a single oversized SQL query (e.g., with a
multi-megabyte string literal) could crash an entire CockroachDB node.

This commit modifies bufferedSink.output() to detect the errMsgTooLarge error
and handle it gracefully. When an oversized message is encountered, instead of
propagating the error, we drop the message and log a warning via Ops.Warningf()
indicating that the message exceeded the buffer size limit. This allows the
node to continue operating normally while still providing visibility into the
issue through logged warnings.

The implementation uses a two-phase approach to avoid deadlock: first, while
holding the sink's mutex, we detect the oversized message and set a flag with
the relevant information; then, after releasing the lock, we emit the warning.
This is necessary because calling Ops.Warningf() while holding the mutex would
cause the warning message to attempt re-entry into the same sink, resulting in
a deadlock when it tries to acquire the already-held lock.

This resolves #152635

Part of: CRDB-53951
Epic: CRDB-56325
Release note: None


Release justification: This PR fixes the CRDB process termination when a log message greater than max-buffer-size of a sink is encountered when exit-on-error flag is enabled.

…edSink

Previously, when a single log message exceeded the configured max-buffer-size
for a buffered sink with exit-on-error enabled, the error would propagate up
and trigger process termination. This was overly aggressive for what amounts
to a logging configuration issue - a single oversized SQL query (e.g., with a
multi-megabyte string literal) could crash an entire CockroachDB node.

This commit modifies bufferedSink.output() to detect the errMsgTooLarge error
and handle it gracefully. When an oversized message is encountered, instead of
propagating the error, we drop the message and log a warning via Ops.Warningf()
indicating that the message exceeded the buffer size limit. This allows the
node to continue operating normally while still providing visibility into the
issue through logged warnings.

The implementation uses a two-phase approach to avoid deadlock: first, while
holding the sink's mutex, we detect the oversized message and set a flag with
the relevant information; then, after releasing the lock, we emit the warning.
This is necessary because calling Ops.Warningf() while holding the mutex would
cause the warning message to attempt re-entry into the same sink, resulting in
a deadlock when it tries to acquire the already-held lock.

Part of: CRDB-53951
Epic: CRDB-56325
Release note: None
@blathers-crl blathers-crl bot force-pushed the blathers/backport-release-25.3-157964 branch from 950651d to 5df6d3b Compare November 24, 2025 08:01
@blathers-crl blathers-crl bot requested review from a team as code owners November 24, 2025 08:01
@blathers-crl blathers-crl bot added blathers-backport This is a backport that Blathers created automatically. O-robot Originated from a bot. labels Nov 24, 2025
@blathers-crl blathers-crl bot requested review from angles-n-daemons and removed request for a team November 24, 2025 08:01
@blathers-crl blathers-crl bot requested review from Abhinav1299, aa-joshi and arjunmahishi and removed request for a team November 24, 2025 08:01
@blathers-crl
Copy link
Author

blathers-crl bot commented Nov 24, 2025

Thanks for opening a backport.

Before merging, please confirm that the change does not break backwards compatibility and otherwise complies with the backport policy. Include a brief release justification in the PR description explaining why the backport is appropriate. All backports must be reviewed by the TL for the owning area. While the stricter LTS policy does not yet apply, please exercise judgment and consider gating non-critical changes behind a disabled-by-default feature flag when appropriate.

@blathers-crl blathers-crl bot added backport Label PR's that are backports to older release branches T-supportability labels Nov 24, 2025
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Collaborator

@dhartunian dhartunian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Release justification: this is a low risk improvement to our error behavior for critical log sinks that can cause unnecessary node crashes.

@Abhinav1299 Abhinav1299 merged commit f8526a3 into release-25.3 Nov 26, 2025
15 checks passed
@Abhinav1299 Abhinav1299 deleted the blathers/backport-release-25.3-157964 branch November 26, 2025 18:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport Label PR's that are backports to older release branches blathers-backport This is a backport that Blathers created automatically. O-robot Originated from a bot. T-supportability target-release-25.3.7

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants