recalculate statistics use sql #696

arily · 2025-04-27T16:47:00Z

Describe your changes

recalculate user statistics using SQL, instead of pulling all scores from db to do the calculations, speeds up score submission
use incr operations on map statistics to avoid race condition

Related Issues / Projects

Checklist

I've manually tested my code

cmyui · 2025-04-30T12:28:33Z

app/repositories/stats.py

    return cast(Stat | None, stat)


+async def sql_recalculate_mode(player_id: int, mode: int) -> None:


While I agree this is objectively better from a performance and atomicity perspective, I feel that most developers will have more trouble maintaining this, it's relatively complex sql and the procedural code (in python) I suspect is a more common of a skillset for the devs who work on b.py (and maintainability is the #1 goal for b.py)

We can have a scaled down version (which only calculates pp and pp acc), or split query into smaller ones.

async def sql_recalculate_mode_pp(player_id: int, mode: int) -> None: sql = f""" WITH ordered_pp AS ( SELECT s1.id scoreId, s1.userid, s1.mode, s1.pp, s1.acc FROM scores s1 INNER JOIN maps m ON s1.map_md5 = m.md5 WHERE s1.status = 2 AND m.status IN (2, 3) AND s1.pp > 0 AND s1.mode = :mode AND s1.userid = :user_id ORDER BY s1.pp DESC, s1.acc DESC, s1.id DESC ), bests AS ( SELECT scoreId, pp, acc, ROW_NUMBER() OVER ( ORDER BY pp DESC ) AS global_rank FROM ordered_pp ), user_calc AS ( SELECT COUNT(*) AS count, SUM(POW (0.95, global_rank - 1) * pp) AS weightedPP, (1 - POW (0.9994, COUNT(*))) * 416.6667 AS bnsPP, SUM(POW (0.95, global_rank - 1) * acc) / SUM(POW (0.95, global_rank - 1)) AS acc FROM bests ), calculated AS ( SELECT *, weightedPP + bnsPP AS pp FROM user_calc ) UPDATE stats s INNER JOIN calculated c ON 1 SET s.pp = c.pp, s.acc = c.acc WHERE s.id = :user_id AND s.mode = :mode """ _ = await app.state.services.database.execute( sql, {"user_id": player_id, "mode": mode} ) async def sql_recalculate_mode_statistics(player_id: int, mode: int) -> None: sql = f""" WITH concrete_stats AS ( SELECT SUM(s2.score) AS total_score, SUM(IF(s2.status IN (2, 3), s2.score, 0)) AS ranked_score, SUM(s2.n300 + s2.n100 + s2.n50 + (IF(s2.mode IN (1, 3, 5), s2.ngeki + s2.nkatu, 0))) AS total_hits, SUM(s2.time_elapsed) AS play_time, SUM(s2.grade = "XH") AS xh_count, MAX(s2.max_combo) AS max_combo, SUM(s2.grade = "X") AS x_count, SUM(s2.grade = "S") AS s_count, SUM(s2.grade = "A") AS a_count FROM scores s2 INNER JOIN maps m2 ON s2.map_md5 = m2.md5 WHERE s2.mode = :mode AND s2.userid = :user_id ) UPDATE stats s INNER JOIN concrete_stats cs ON 1 SET s.tscore = cs.total_score, s.rscore = cs.ranked_score, s.plays = c.count, s.playtime = cs.play_time, s.max_combo = cs.max_combo, s.total_hits = cs.total_hits, s.xh_count = cs.xh_count, s.x_count = cs.x_count, s.sh_count = cs.s_count, s.s_count = cs.a_count WHERE s.id = :user_id AND s.mode = :mode """ _ = await app.state.services.database.execute( sql, {"user_id": player_id, "mode": mode} )

if this is acceptable we could strike a balance between procedural code & performance.

Our server ran into performance / memory issue recently. With this update our score submission speed went from ~5s for a heavy user, into ~600ms for single mode, and few orders better when users MP (don't forget that we have global submission thread lock). Not to mention that we also saved bunch of IO and memory.

this also comes with added benefits for example when a map changes states, rather than setup complex trigger and handlers, taking care of any potential edge cases, statistics will be corrected next time this user submits a score.

my feedback is more that i don't think many devs are familiar with sql, and this uses more advanced features like window functions, CTEs, etc. -- i suspect having it as pure-python is more familiar for most

this also comes with added benefits for example when a map changes states, rather than setup complex trigger and handlers, taking care of any potential edge cases, statistics will be corrected next time this user submits a score.

i don't disagree w/ this, but it can be implemented in pure py as well

i think the most important issue in recalc to fix would be recalculation of score statuses when scores of a user shift around

my feedback is more that i don't think many devs are familiar with sql, and this uses more advanced features like window functions, CTEs, etc. -- i suspect having it as pure-python is more familiar for most

this also comes with added benefits for example when a map changes states, rather than setup complex trigger and handlers, taking care of any potential edge cases, statistics will be corrected next time this user submits a score.

i don't disagree w/ this, but it can be implemented in pure py as well

imo it's too cluttered for something which can be implemented in like 20-50 lines of python, and yeah sql can be confusing at times

But the perf & scalability trade off is worth the effort I believe.

it's a good point, let me consider it & review more deeply soon

the issue with memory usage you're having here is because it loads all best scores for a given mode at once, this can be fixed with OFFSET anyway

pagination is a good idea, as time-to-execute is less of a problem than resource utilization at large scale imo (from akatsuki experience) -- not sure but i'm not entirely remembering the technical considerations of this process (e.g. row/table locking w.r.t. transactions, requirements of upfront processing, etc.), so i'll have to refresh myself

i think the most important issue in recalc to fix would be recalculation of score statuses when scores of a user shift around

A PR for that already exists for quite some time. #573

I believe the simplicity of the current pure python way easily outweighs any performance gains from this. As someone that ran bancho.py at its limits, score submission was rather the least of issues.

i think the most important issue in recalc to fix would be recalculation of score statuses when scores of a user shift around

we also have solution for this one as well. https://github.com/ppy-sb/nyamatrix/blob/main/nyamatrix/qb/update_score_status.py
This was built to reduce memory usage on our server during recalc. Now with built-in recalc script memory usage will go above 4GB, while running prod bpy. Our server (configured with 4c8g) don't have enough memory. New one uses 113M doing so.
This one uses sql as well so it might not be your choice of tech stack.

the issue with memory usage you're having here is because it loads all best scores for a given mode at once, this can be fixed with OFFSET anyway

There are several technical difficulties with this approach If I understand the OFFSET correctly.:

you still have to maintain a user-mode-beatmap-scores and user-mode-beatmap-best maps in order to decide which score should be set to best.

large offset in SQL is VERY SLOW thus modern apps and websites don't often give you ability to jump to precise pages anymore, and chosen cursor based infinity scrolling.

There's however a better way imo, to chunk user ids, as they don't interference each other.

arily requested review from 7mochi, NiceAesth and cmyui as code owners April 27, 2025 16:47

arily force-pushed the akat-master-0428-update-stat-sql branch 2 times, most recently from edd2956 to 37b84f2 Compare April 29, 2025 20:12

cmyui reviewed Apr 30, 2025

View reviewed changes

cmyui self-requested a review May 1, 2025 06:36

cmyui assigned cmyui and arily May 1, 2025

cmyui added performance Improvements to resource usage without changing functionality developer tooling labels May 1, 2025

arily requested a review from infernalfire72 as a code owner May 2, 2025 07:07

arily force-pushed the akat-master-0428-update-stat-sql branch from 24e064c to 57a5a0c Compare May 18, 2025 16:13

arily force-pushed the akat-master-0428-update-stat-sql branch 3 times, most recently from 5333606 to b97c348 Compare August 3, 2025 18:43

arily added 4 commits August 4, 2025 02:43

recalculate statistics use sql

a92d930

ranked score counts only best scores

fb640d1

add coalesce

95a0533

edge case: beatmap may not exist

2072268

arily force-pushed the akat-master-0428-update-stat-sql branch from b97c348 to 2072268 Compare August 3, 2025 18:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

recalculate statistics use sql #696

recalculate statistics use sql #696

Uh oh!

arily commented Apr 27, 2025

Uh oh!

cmyui Apr 30, 2025 •

edited

Loading

Uh oh!

arily Apr 30, 2025 •

edited

Loading

Uh oh!

arily Apr 30, 2025

Uh oh!

cmyui Apr 30, 2025 •

edited

Loading

Uh oh!

cmyui Apr 30, 2025 •

edited

Loading

Uh oh!

7ez Apr 30, 2025

Uh oh!

cmyui May 1, 2025

Uh oh!

cmyui May 1, 2025 •

edited

Loading

Uh oh!

minisbett May 1, 2025

Uh oh!

arily May 1, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		return cast(Stat \| None, stat)


		async def sql_recalculate_mode(player_id: int, mode: int) -> None:

recalculate statistics use sql #696

Are you sure you want to change the base?

recalculate statistics use sql #696

Uh oh!

Conversation

arily commented Apr 27, 2025

Describe your changes

Related Issues / Projects

Checklist

Uh oh!

cmyui Apr 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

arily Apr 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

arily Apr 30, 2025

Choose a reason for hiding this comment

Uh oh!

cmyui Apr 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cmyui Apr 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

7ez Apr 30, 2025

Choose a reason for hiding this comment

Uh oh!

cmyui May 1, 2025

Choose a reason for hiding this comment

Uh oh!

cmyui May 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

minisbett May 1, 2025

Choose a reason for hiding this comment

Uh oh!

arily May 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

cmyui Apr 30, 2025 •

edited

Loading

arily Apr 30, 2025 •

edited

Loading

cmyui Apr 30, 2025 •

edited

Loading

cmyui Apr 30, 2025 •

edited

Loading

cmyui May 1, 2025 •

edited

Loading

arily May 1, 2025 •

edited

Loading