Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use withscores to optmize leaders call where rank can be inferred. #64

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

GraylinKim
Copy link
Member

@GraylinKim GraylinKim commented May 21, 2019

When returning results for multiple contiguous members on the leaderboard we were calling zrank and zscore on each of them individually (in a pipeline) which was giving us O(M * log(N)) performance instead of the O(log(N)) performance that we might expect.

We can use withscores=True to avoid the extra zscore commands. Because we know that these are contiguous members of the leaderboard we can infer the ranks of all members from the rank of the first member and avoid (almost) all of the additional zrank calls. This ranking logic has been implemented for each of the 3 types of leaderboards.

This should improve performance for all_leaders, leaders, around_me, members_from_rank_range, and top methods; all of which deal with contiguous chunks of the leaderboard.

Also fixes test builds on python3 by specifying decode_responses.

@GraylinKim GraylinKim requested a review from czarneckid May 21, 2019 20:20
@GraylinKim GraylinKim self-assigned this May 21, 2019
@GraylinKim
Copy link
Member Author

GraylinKim commented May 21, 2019

Looks like we could apply a similar optimization to the around_me, all_members, and members_from_rank_range call.

For members_from_score_range and other similar queries we could do a single zrank call and extrapolate from there since we know we are selecting contiguous chunks of members. I don't use all_members, members_from_rank_range, or members_from_score_range though so I'm not going to fix them here.

@GraylinKim GraylinKim force-pushed the more-efficient-leaders-impl branch from dd3ba73 to d56dab4 Compare May 22, 2019 17:06
leaderboard/tie_ranking_leaderboard.py Outdated Show resolved Hide resolved
current_rank_start = 0
for index, (member, score) in enumerate(response):
if current_score is None:
current_rank = self.rank_for_in(leaderboard_name, member)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the reasons for the current implementation is that once you have a page of members, the pipeline to retrieve the score and rank gives you those values for the page of members at that time. By splitting up the score and retrieving the rank separately, it's possible the rank changes here by a change in score for a member (or members).

Copy link
Member Author

@GraylinKim GraylinKim Jun 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is true, but in a different way, with the current implementation. We first fetch the members then rank/score them all at the same time. If scores change in between those two queries we could be:

  • Including players that are no longer in the range.
  • Missing players that are now in the range
  • Showing members out of order (although I think it previously resorted in RAM to catch this edge case)

I do see that in this current implementation though it is equally possible that the first two scores and ranks change in the gap causing wrong/duplicate ranks downstream. I'll think a bit more on this point.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the correct path forward in either case is to use LUA scripts which can have transaction-like semantics. This would ensure that we have a consistent view of the data while we do more advanced calculations and would be even more performant.

Moving forward on that path would be a huge undertaking though, essentially moving chunks of our python to LUA scripts. Thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A change to using LUA scripts would essentially be a rewrite of most of the core functionality and while it's unclear if there'd be breaking changes, the changes required there feel like that'd warrant a major version bump. So I'm a big 👎 at the moment for these changes outside of LUA, but the smaller readability, variable name, and helper method changes seem worthy of a separate PR.


def _with_member_data(self, leaderboard_name, members, ranks_for_members):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do like splitting out the _with_member_data and _sort_by methods since those can be shared among implementations.

leaderboard/leaderboard.py Show resolved Hide resolved
leaderboard/tie_ranking_leaderboard.py Outdated Show resolved Hide resolved
@czarneckid
Copy link
Member

@GraylinKim If we're changing the base method as to how we do calculation of rank, then I'd rather see that change applied to the other methods mentioned in your second comment as opposed to applying this optimization piecemeal.

Base automatically changed from master to main March 11, 2021 16:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants