Skip to content

Conversation

@jack2012aa
Copy link
Contributor

Description

The test testCloseWithZeroTimeoutFromCallerThread is flaky. The consumer may gets all of the messages after the producer is force closed, while futures in the producer are completed exceptionally.

The bug comes from a race condition introduced by RecordAccumulator#close and RecordAccumulator#batchReady. RecordAccumulator#close sets the closed flag to true, and RecordAccumulator#batchReady thinks the batch is sendable. As a result those batches are sent in the same Sender#runOnce call because runOnce doesn't check the forceClose flag.

Test

An unit test is added to SenderTest. It asserts that after a sender is force closed no message should be sent or polled.

Change

It is hard to fully eliminate the bug: Sender#forceClose can happen at any point of Sender#runOnce since they run in different threads. The only way to ensure that "no action is permitted after force close" is to lock runOnce, which is expensive.

Adding a check on the flag before the poll in runOnce can reduce the chance of the bug. Now the race condition only happens if sender is force closed during the poll. Notice that this eliminates the flaky test. In the test scenario, if poll happens during the poll, the client has nothing to operate in this round, and there is no next run.

@github-actions github-actions bot added triage PRs from the community producer clients small Small PRs labels Oct 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants