-
Notifications
You must be signed in to change notification settings - Fork 387
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Revert "Fix possible endless wait in stop() after AUTH_FAILED error (… #744
Conversation
Here is a test script that will show the behavior in both async and
|
In addition to the reasoning in the commit message, I have also I tested with ZooKeeper 3.8.3; perhaps that is a behavior change from |
Thank you for the PR, I will look into it. (FWIW, tests failures seem unrelated to the change) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the original PR, it seems the change was not addressing the issue correctly. And quite honestly, I don't understand how the test case provided in the gist triggered the issue that was described.
The proper and expected behavior is that everything ahead of the close in the send queue be flushed before closing. So the reversal looks proper to me.
In any case, LGTM. Thanks!
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #744 +/- ##
==========================================
- Coverage 96.84% 96.81% -0.03%
==========================================
Files 27 27
Lines 3549 3549
==========================================
- Hits 3437 3436 -1
- Misses 112 113 +1 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you again for the PR.
You mentioned this might be an issue because of some specific ZK version, I was not able to reproduce it using ZK 3.6.4 or ZK 3.5.10.
Also, stop()
and close()
are called after every test methods (during tearDown()
), so I think we would have detected it before if the issue was there, unfortunately the job logs has expired so I can't check that.
FWIW, the initial PR mentioned the issue #582 which more likely seem would be fixed by adding a timeout
in the thread join()
.
Anyway, I agree with both you and also think this should be reversed. Would it be possible to make sure the commit title match our guidelines?
…LED error (python-zk#688)" This reverts commit 5225b3e. The commit being reverted here caused kazoo not to empty the send queue before disconnecting. This means that if a client submitted asynchronous requests and then called client.stop(), the connection would be closed immediately, usually after only one (but possibly more) of the submitted requests were sent. Prior to this, Kazoo would empty the queue of submitted requests all the way up to and including the Close request when client.stop() was called. Another area where this caused problems is in a busy multi-threaded system. One thread might decide to gracefully close the connection, but if there is any traffic generated by another thread, then the connection would end up terminating without ever sending the Close request. Failure to gracefully shutdown a ZooKeeper connection can mean that other system components need to wait for ephemeral node timeouts to detect that a component has shutdown.
Thanks! |
Revert "Fix possible endless wait in stop() after AUTH_FAILED error (#688)"
This reverts commit 5225b3e.
The commit being reverted here caused kazoo not to empty the send
queue before disconnecting. This means that if a client submitted
asynchronous requests and then called client.stop(), the connection
would be closed immediately, usually after only one (but possibly
more) of the submitted requests were sent. Prior to this, Kazoo
would empty the queue of submitted requests all the way up to and
including the Close request when client.stop() was called.
Another area where this caused problems is in a busy multi-threaded
system. One thread might decide to gracefully close the connection,
but if there is any traffic generated by another thread, then the
connection would end up terminating without ever sending the Close
request.
Failure to gracefully shutdown a ZooKeeper connection can mean that
other system components need to wait for ephemeral node timeouts to
detect that a component has shutdown.
Given that this behavior is easily reproducible and can have serious
consequences in production, 5225b3e is reverted.