-
Notifications
You must be signed in to change notification settings - Fork 13.8k
[FLINK-38746][table-runtime] Fix StackOverflowError in RecordsWindowBuffer.addElement #27282
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FLINK-38746][table-runtime] Fix StackOverflowError in RecordsWindowBuffer.addElement #27282
Conversation
|
@flinkbot run azure |
012be30 to
e53fbfb
Compare
| flush(); | ||
| // remember to add the input element again | ||
| addElement(key, sliceEnd, element); | ||
| while (true) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we also need to move the above line
minSliceEnd = Math.min(sliceEnd, minSliceEnd);
inside the loop?
In flush(), it might be reset to Long.MAX_VALUE, so we'll end up with the wrong minSliceEnd field
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch!
I have moved it between append and break, and introduced a new test to cover this case.
| if (recordsBuffer.getNumKeys() == 0) { | ||
| // Buffer is empty, retry won't help (record is too large for the buffer) | ||
| throw e; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is getNumKeys correct and sufficient?
Or should getNumElements be used instead or in addition?
In append(), I see that numKeys is not incremented if lookupInfo.found.
I guess there's an implicit invariant that numKeys > 0 in this case, but I'm not sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are equivalent.
- One element must have one key
- Every key has at least one element
numKeys == 0⟺numElements == 0⟺ buffer is empty
e53fbfb to
a726361
Compare
1996fanrui
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @rkhachatryan for the review, all your comments are addressed.
| if (recordsBuffer.getNumKeys() == 0) { | ||
| // Buffer is empty, retry won't help (record is too large for the buffer) | ||
| throw e; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are equivalent.
- One element must have one key
- Every key has at least one element
numKeys == 0⟺numElements == 0⟺ buffer is empty
| flush(); | ||
| // remember to add the input element again | ||
| addElement(key, sliceEnd, element); | ||
| while (true) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch!
I have moved it between append and break, and introduced a new test to cover this case.
What is the purpose of the change
Problem
RecordsWindowBuffer.addElement()catchesEOFExceptionand retries recursively, causing StackOverflowError when retry keeps failing.Root Cause
The original code assumes "flush will always free enough space for retry". This assumption fails when unrecoverable errors occur, leading to infinite recursion.
Solution
Use
numKeys == 0as the termination condition:Why This Works
Key point:
flush()clears the buffer, makingnumKeys == 0. If retry still fails after flush, entering the catch block again withnumKeys == 0indicates the problem is not caused by a full buffer, but an unrecoverable error. In this case, an exception should be thrown instead of continuing to retry.Benefits
Brief change log
Verifying this change
RecordsWindowBufferTestDoes this pull request potentially affect one of the following parts:
@Public(Evolving): noDocumentation