KAFKA-18475: Flaky PlaintextProducerSendTest testCloseWithZeroTimeoutFromCallerThread #20791

jack2012aa · 2025-10-29T23:22:17Z

Description

The test testCloseWithZeroTimeoutFromCallerThread is flaky. The consumer may gets all of the messages after the producer is force closed, while futures in the producer are completed exceptionally.

The bug comes from a race condition introduced by RecordAccumulator#close and RecordAccumulator#batchReady. RecordAccumulator#close sets the closed flag to true, and RecordAccumulator#batchReady thinks the batch is sendable. As a result those batches are sent in the same Sender#runOnce call because runOnce doesn't check the forceClose flag.

Test

An unit test is added to SenderTest. It asserts that after a sender is force closed no message should be sent or polled.

Change

It is hard to fully eliminate the bug: Sender#forceClose can happen at any point of Sender#runOnce since they run in different threads. The only way to ensure that "no action is permitted after force close" is to lock runOnce, which is expensive.

Adding a check on the flag before the poll in runOnce can reduce the chance of the bug. Now the race condition only happens if sender is force closed during the poll. Notice that this eliminates the flaky test. In the test scenario, if poll happens during the poll, the client has nothing to operate in this round, and there is no next run.

jack2012aa and others added 4 commits October 28, 2025 19:26

Catch race condition

3452d00

Accept different ways to force close in test

ee77f8a

Fix

1c70834

Merge branch 'apache:trunk' into KAFKA-18475

0143fd1

github-actions bot added triage PRs from the community producer clients small Small PRs labels Oct 29, 2025

chia7712 added the ci-approved label Oct 30, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

KAFKA-18475: Flaky PlaintextProducerSendTest testCloseWithZeroTimeoutFromCallerThread #20791

KAFKA-18475: Flaky PlaintextProducerSendTest testCloseWithZeroTimeoutFromCallerThread #20791

jack2012aa commented Oct 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

KAFKA-18475: Flaky PlaintextProducerSendTest testCloseWithZeroTimeoutFromCallerThread #20791

Are you sure you want to change the base?

KAFKA-18475: Flaky PlaintextProducerSendTest testCloseWithZeroTimeoutFromCallerThread #20791

Conversation

jack2012aa commented Oct 29, 2025

Description

Test

Change

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants