Use withscores to optmize leaders call where rank can be inferred. #64

GraylinKim · 2019-05-21T20:20:27Z

When returning results for multiple contiguous members on the leaderboard we were calling zrank and zscore on each of them individually (in a pipeline) which was giving us O(M * log(N)) performance instead of the O(log(N)) performance that we might expect.

We can use withscores=True to avoid the extra zscore commands. Because we know that these are contiguous members of the leaderboard we can infer the ranks of all members from the rank of the first member and avoid (almost) all of the additional zrank calls. This ranking logic has been implemented for each of the 3 types of leaderboards.

This should improve performance for all_leaders, leaders, around_me, members_from_rank_range, and top methods; all of which deal with contiguous chunks of the leaderboard.

Also fixes test builds on python3 by specifying decode_responses.

GraylinKim · 2019-05-21T20:37:53Z

Looks like we could apply a similar optimization to the around_me, all_members, and members_from_rank_range call.

For members_from_score_range and other similar queries we could do a single zrank call and extrapolate from there since we know we are selecting contiguous chunks of members. I don't use all_members, members_from_rank_range, or members_from_score_range though so I'm not going to fix them here.

leaderboard/tie_ranking_leaderboard.py

czarneckid · 2019-05-30T18:57:56Z

leaderboard/competition_ranking_leaderboard.py

+        current_rank_start = 0
+        for index, (member, score) in enumerate(response):
+            if current_score is None:
+                current_rank = self.rank_for_in(leaderboard_name, member)


One of the reasons for the current implementation is that once you have a page of members, the pipeline to retrieve the score and rank gives you those values for the page of members at that time. By splitting up the score and retrieving the rank separately, it's possible the rank changes here by a change in score for a member (or members).

This is true, but in a different way, with the current implementation. We first fetch the members then rank/score them all at the same time. If scores change in between those two queries we could be:

Including players that are no longer in the range.

Missing players that are now in the range

Showing members out of order (although I think it previously resorted in RAM to catch this edge case)

I do see that in this current implementation though it is equally possible that the first two scores and ranks change in the gap causing wrong/duplicate ranks downstream. I'll think a bit more on this point.

I think the correct path forward in either case is to use LUA scripts which can have transaction-like semantics. This would ensure that we have a consistent view of the data while we do more advanced calculations and would be even more performant.

Moving forward on that path would be a huge undertaking though, essentially moving chunks of our python to LUA scripts. Thoughts?

A change to using LUA scripts would essentially be a rewrite of most of the core functionality and while it's unclear if there'd be breaking changes, the changes required there feel like that'd warrant a major version bump. So I'm a big 👎 at the moment for these changes outside of LUA, but the smaller readability, variable name, and helper method changes seem worthy of a separate PR.

czarneckid · 2019-05-30T18:59:00Z

leaderboard/leaderboard.py


+    def _with_member_data(self, leaderboard_name, members, ranks_for_members):


I do like splitting out the _with_member_data and _sort_by methods since those can be shared among implementations.

leaderboard/leaderboard.py

leaderboard/tie_ranking_leaderboard.py

czarneckid · 2019-05-30T20:16:44Z

@GraylinKim If we're changing the base method as to how we do calculation of rank, then I'd rather see that change applied to the other methods mentioned in your second comment as opposed to applying this optimization piecemeal.

test/leaderboard/tie_ranking_leaderboard_test.py

GraylinKim requested a review from czarneckid May 21, 2019 20:20

GraylinKim self-assigned this May 21, 2019

GraylinKim added 2 commits May 22, 2019 13:06

Use withscores to optimize leaders call where rank can be inferred.

b353bad

Leaderboard specific implementations.

d56dab4

GraylinKim force-pushed the more-efficient-leaders-impl branch from dd3ba73 to d56dab4 Compare May 22, 2019 17:06

czarneckid requested changes May 30, 2019

View reviewed changes

GraylinKim mentioned this pull request Jun 10, 2019

Fix TieRankingLeaderboard.rank_member_across #65

Merged

GraylinKim commented Jun 10, 2019

View reviewed changes

test/leaderboard/tie_ranking_leaderboard_test.py Outdated Show resolved Hide resolved

GraylinKim added 2 commits June 10, 2019 14:11

Revert changes pulled out into separate patches.

6354405

Improve variable naming for clarity.

452a214

Base automatically changed from master to main March 11, 2021 16:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use withscores to optmize leaders call where rank can be inferred. #64

Use withscores to optmize leaders call where rank can be inferred. #64

GraylinKim commented May 21, 2019 •

edited

Loading

GraylinKim commented May 21, 2019 •

edited

Loading

czarneckid May 30, 2019

GraylinKim Jun 10, 2019 •

edited

Loading

GraylinKim Jun 10, 2019

czarneckid Jun 18, 2019

czarneckid May 30, 2019

czarneckid commented May 30, 2019


		def _with_member_data(self, leaderboard_name, members, ranks_for_members):

Use withscores to optmize leaders call where rank can be inferred. #64

Are you sure you want to change the base?

Use withscores to optmize leaders call where rank can be inferred. #64

Conversation

GraylinKim commented May 21, 2019 • edited Loading

GraylinKim commented May 21, 2019 • edited Loading

czarneckid May 30, 2019

Choose a reason for hiding this comment

GraylinKim Jun 10, 2019 • edited Loading

Choose a reason for hiding this comment

GraylinKim Jun 10, 2019

Choose a reason for hiding this comment

czarneckid Jun 18, 2019

Choose a reason for hiding this comment

czarneckid May 30, 2019

Choose a reason for hiding this comment

czarneckid commented May 30, 2019

GraylinKim commented May 21, 2019 •

edited

Loading

GraylinKim commented May 21, 2019 •

edited

Loading

GraylinKim Jun 10, 2019 •

edited

Loading