-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Improve cluster connection pool logic when disconnecting #5
fix: Improve cluster connection pool logic when disconnecting #5
Conversation
Signed-off-by: Martin Slota <[email protected]>
Signed-off-by: Martin Slota <[email protected]>
Signed-off-by: Martin Slota <[email protected]>
Signed-off-by: Martin Slota <[email protected]>
… connection pool instance Signed-off-by: Martin Slota <[email protected]>
Signed-off-by: Martin Slota <[email protected]>
Signed-off-by: Martin Slota <[email protected]>
Signed-off-by: Martin Slota <[email protected]>
Signed-off-by: Martin Slota <[email protected]>
Signed-off-by: Martin Slota <[email protected]>
Signed-off-by: Martin Slota <[email protected]>
Signed-off-by: Martin Slota <[email protected]>
Signed-off-by: Martin Slota <[email protected]>
Signed-off-by: Martin Slota <[email protected]>
Signed-off-by: Martin Slota <[email protected]>
Signed-off-by: Martin Slota <[email protected]>
3d6cc5a
to
158c64c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm with a green CI
This reverts commit 6536e22. Signed-off-by: Martin Slota <[email protected]>
Signed-off-by: Martin Slota <[email protected]>
…e to connect using the Cluster client Signed-off-by: Martin Slota <[email protected]>
…) is finished Signed-off-by: Martin Slota <[email protected]>
cb3509e
to
d3f83c5
Compare
Now that I saw the results from CI, I also figured out how to run the unit tests locally. 😅 The tests caught a bug that I then fixed in 788361c and 0deaeba. 😬 Furthermore, some of the existing functional tests were assuming that responding with In my view, this change in behaviour is desirable since it prevents node clients which are no longer tracked in the connection pool from emitting additional events that could mess with the state of the cluster client (which could easily have moved on, e.g. attempting to reconnect). So for the time being, I adjusted the tests by adding valid slots tables to the mock server in d5d85e9. Finally, one of the tests was assuming that when a node disappears from the cluster, the node removal (the Does this approach sound kinda reasonable? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes.
Still LGTM
As far as I can tell, this should be ready to get merged. I cannot merge it myself. |
Will ship a release asap. |
This PR is a port of redis/ioredis#1864 and what follows below is a verbatim copy of that PR's description. A reproduce for the bug described here that uses
valkey_server
andiovalkey
instead ofredis_server
andioredis
, respectively, can be found in this repository.Motivation and Background
This is an attempt to fix errors occurring when a
connect()
call is made shortly after adisconnect()
, which is something that the Bull library does when pausing a queue.Here's a relatively minimal way to reproduce an error:
Running that script in a loop using
against the
main
branch ofioredis
quickly results in this output:My debugging led me to believe that the existing node cleanup logic in the
ConnectionPool
class leads to race conditions: upondisconnect()
, the this.connectionPool.reset() call will remove nodes from the pool without cleaning up the event listener which may then subsequently issue more than onedrain
event. Depending on timing, one of the extradrain
events may fire afterconnect()
and change the status toclose
, interfering with the connection attempt and leading to the error above.Changes
ConnectionPool
class and remove them from the nodes whenever they are removed from the pool.-node
/drain
regardless of whether nodes disconnected or were removed through areset()
call.reset()
, add nodes before removing old ones to avoid unwanteddrain
events.this
point to the connection pool instance.main
is seemingly different from the error shown above but it still seems related to the disconnection logic and still gets fixed by the changes in this PR.