-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use withscores to optmize leaders call where rank can be inferred. #64
base: main
Are you sure you want to change the base?
Conversation
Looks like we could apply a similar optimization to the For |
dd3ba73
to
d56dab4
Compare
current_rank_start = 0 | ||
for index, (member, score) in enumerate(response): | ||
if current_score is None: | ||
current_rank = self.rank_for_in(leaderboard_name, member) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One of the reasons for the current implementation is that once you have a page of members, the pipeline to retrieve the score and rank gives you those values for the page of members at that time. By splitting up the score and retrieving the rank separately, it's possible the rank changes here by a change in score for a member (or members).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is true, but in a different way, with the current implementation. We first fetch the members then rank/score them all at the same time. If scores change in between those two queries we could be:
- Including players that are no longer in the range.
- Missing players that are now in the range
- Showing members out of order (although I think it previously resorted in RAM to catch this edge case)
I do see that in this current implementation though it is equally possible that the first two scores and ranks change in the gap causing wrong/duplicate ranks downstream. I'll think a bit more on this point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the correct path forward in either case is to use LUA scripts which can have transaction-like semantics. This would ensure that we have a consistent view of the data while we do more advanced calculations and would be even more performant.
Moving forward on that path would be a huge undertaking though, essentially moving chunks of our python to LUA scripts. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A change to using LUA scripts would essentially be a rewrite of most of the core functionality and while it's unclear if there'd be breaking changes, the changes required there feel like that'd warrant a major version bump. So I'm a big 👎 at the moment for these changes outside of LUA, but the smaller readability, variable name, and helper method changes seem worthy of a separate PR.
|
||
def _with_member_data(self, leaderboard_name, members, ranks_for_members): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do like splitting out the _with_member_data
and _sort_by
methods since those can be shared among implementations.
@GraylinKim If we're changing the base method as to how we do calculation of rank, then I'd rather see that change applied to the other methods mentioned in your second comment as opposed to applying this optimization piecemeal. |
When returning results for multiple contiguous members on the leaderboard we were calling
zrank
andzscore
on each of them individually (in a pipeline) which was giving usO(M * log(N))
performance instead of theO(log(N))
performance that we might expect.We can use
withscores=True
to avoid the extrazscore
commands. Because we know that these are contiguous members of the leaderboard we can infer the ranks of all members from the rank of the first member and avoid (almost) all of the additionalzrank
calls. This ranking logic has been implemented for each of the 3 types of leaderboards.This should improve performance for
all_leaders
,leaders
,around_me
,members_from_rank_range
, andtop
methods; all of which deal with contiguous chunks of the leaderboard.Also fixes test builds on python3 by specifying
decode_responses
.