Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider frequency of "/localnodes" fetches on idle clients #46

Open
nyh opened this issue Nov 24, 2024 · 0 comments
Open

Consider frequency of "/localnodes" fetches on idle clients #46

nyh opened this issue Nov 24, 2024 · 0 comments
Labels
enhancement New feature or request

Comments

@nyh
Copy link
Contributor

nyh commented Nov 24, 2024

Currently it seems all our load balancer implementations (I checked Python, Java and Go), update their list of live nodes (via the "/localnodes" call) once every second. Updating this list frequently is important - so we don't continue sending requests to a dead node for a long time, and also to quickly discover nodes coming up. It's also a reasonably cheap request, and if a client does 100 requests per second then doing one more each second (and a specially cheap one like /localnodes) is negligible.

However there is one situation where doing a "/localnodes" request every second isn't negligible: It is the case where we have a lot of idle client processes. Perhaps it's worthwhile recognizing this case and not do a "/localnodes" request every second from a client library that knows it is idle. We could lower the frequency in this case, say from 1 second to 60 seconds - but this also has the obvious downsides like continuing to send requests to a dead node for a whole minute. So perhaps we can consider a different approach: Do "/localnodes" requests rarely (e.g., once an hour), and also do another /localnodes request right after executing the first client request in a particular second. The rationale behind this proposal is:

  1. Excecuting a /localnodes before a client request will increase the request's latency, which is undesirable.
  2. If our list of nodes is outdated, the user request may fail and the client will retry the request (the AWS driver does this automatically). Because the load balancer will do a /localnodes after the request, this will ensure that the retry (if not done too quickly) will get an up-to-date node list.
  3. If we were to never run /localnodes and use a one-week old node list, it is theoretically possible that all of them have changed since, and we won't know any live node and won't be able to refresh our list. This is why it makes sense to infrequently refresh the list (e.g., once an hour) to notice when the cluster is undergoing major changes and not allow a situation where we didn't run /localnodes for a week.
@nyh nyh added the enhancement New feature or request label Nov 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant