[fix] Write stuck due to pending add callback by multiple threads #4557
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Motivation
Background: the normal steps of adding an entry
PendingAddOp.writeComplete
after receiving the response from BK servers.Background: the steps of disconnection
PendingAddOp.writeComplete
. You can reproduce this flow by the new testtestAddEntriesCallbackWithBKClientThread
Issue: write stuck due to pending add callback by multiple threads
3
3
2
client->BK1
client->BK2
client-> BK3
ack
:1/3
ack
:2/3
complete
since ack quorum is2/3
PendingAddOp.writeComplete
thread
:bookkeeper workers
thread
:client-server io
Since there are multiple threads that will trigger all successful callbacks in the pending queue, it may cause the following race condition[code-2]
thread-1
andthread-2
may be triggered by differentPendingAddOps
thread-1
thread-2
success
success
queue.pop
queue.pop
[1] code link: https://github.com/apache/bookkeeper/blob/master/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/PendingAddOp.java#L307
[2] code-link: https://github.com/apache/bookkeeper/blob/master/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerHandle.java#L2092-L2124
The issue we encountered
A pulsar topic is stuck at
ClosingLedger
statepulsar topic stats
logs
Changes
Switch the thread to
Bookkeeper works
if the connection is broken.