Skip to content

Conversation

@arily
Copy link
Contributor

@arily arily commented Apr 27, 2025

Describe your changes

  • recalculate user statistics using SQL, instead of pulling all scores from db to do the calculations, speeds up score submission
  • use incr operations on map statistics to avoid race condition

Related Issues / Projects

Checklist

  • I've manually tested my code

@arily arily force-pushed the akat-master-0428-update-stat-sql branch 2 times, most recently from edd2956 to 37b84f2 Compare April 29, 2025 20:12
return cast(Stat | None, stat)


async def sql_recalculate_mode(player_id: int, mode: int) -> None:
Copy link
Member

@cmyui cmyui Apr 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I agree this is objectively better from a performance and atomicity perspective, I feel that most developers will have more trouble maintaining this, it's relatively complex sql and the procedural code (in python) I suspect is a more common of a skillset for the devs who work on b.py (and maintainability is the #1 goal for b.py)

Copy link
Contributor Author

@arily arily Apr 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can have a scaled down version (which only calculates pp and pp acc), or split query into smaller ones.

async def sql_recalculate_mode_pp(player_id: int, mode: int) -> None:
    sql = f"""
    WITH 
    ordered_pp AS (
        SELECT
            s1.id scoreId,
            s1.userid,
            s1.mode,
            s1.pp,
            s1.acc
        FROM
            scores s1
        INNER JOIN maps m ON s1.map_md5 = m.md5
        WHERE
            s1.status = 2
            AND m.status IN (2, 3)
            AND s1.pp > 0
            AND s1.mode = :mode
            AND s1.userid = :user_id
        ORDER BY
            s1.pp DESC,
            s1.acc DESC,
            s1.id DESC
    ),
    bests AS (
        SELECT
            scoreId,
            pp,
            acc,
            ROW_NUMBER() OVER (
                ORDER BY
                    pp DESC
            ) AS global_rank
        FROM
            ordered_pp
    ),
    user_calc AS (
        SELECT
            COUNT(*) AS count,
            SUM(POW (0.95, global_rank - 1) * pp) AS weightedPP,
            (1 - POW (0.9994, COUNT(*))) * 416.6667 AS bnsPP,
            SUM(POW (0.95, global_rank - 1) * acc) / SUM(POW (0.95, global_rank - 1)) AS acc
        FROM
            bests
    ),
    calculated AS (
        SELECT
            *,
            weightedPP + bnsPP AS pp
        FROM
            user_calc
    )
UPDATE stats s
INNER JOIN calculated c ON 1
SET
    s.pp = c.pp,
    s.acc = c.acc
WHERE s.id = :user_id AND s.mode = :mode
    """

    _ = await app.state.services.database.execute(
        sql, {"user_id": player_id, "mode": mode}
    )

async def sql_recalculate_mode_statistics(player_id: int, mode: int) -> None:
    sql = f"""
    WITH 
    concrete_stats AS (
        SELECT
            SUM(s2.score) AS total_score,
            SUM(IF(s2.status IN (2, 3), s2.score, 0)) AS ranked_score,
            SUM(s2.n300 + s2.n100 + s2.n50 + (IF(s2.mode IN (1, 3, 5), s2.ngeki + s2.nkatu, 0))) AS total_hits,
            SUM(s2.time_elapsed) AS play_time,
            SUM(s2.grade = "XH") AS xh_count,
            MAX(s2.max_combo) AS max_combo,  
            SUM(s2.grade = "X") AS x_count,
            SUM(s2.grade = "S") AS s_count,
            SUM(s2.grade = "A") AS a_count
        FROM
            scores s2
        INNER JOIN maps m2 ON s2.map_md5 = m2.md5
        WHERE s2.mode = :mode
        AND s2.userid = :user_id
    )
UPDATE stats s
INNER JOIN concrete_stats cs ON 1
SET
    s.tscore = cs.total_score,
    s.rscore = cs.ranked_score,
    s.plays = c.count,
    s.playtime = cs.play_time,
    s.max_combo = cs.max_combo,
    s.total_hits = cs.total_hits,
    s.xh_count = cs.xh_count,
    s.x_count = cs.x_count,
    s.sh_count = cs.s_count,
    s.s_count = cs.a_count
WHERE s.id = :user_id AND s.mode = :mode
    """

    _ = await app.state.services.database.execute(
        sql, {"user_id": player_id, "mode": mode}
    )

if this is acceptable we could strike a balance between procedural code & performance.

Our server ran into performance / memory issue recently. With this update our score submission speed went from ~5s for a heavy user, into ~600ms for single mode, and few orders better when users MP (don't forget that we have global submission thread lock). Not to mention that we also saved bunch of IO and memory.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this also comes with added benefits for example when a map changes states, rather than setup complex trigger and handlers, taking care of any potential edge cases, statistics will be corrected next time this user submits a score.

Copy link
Member

@cmyui cmyui Apr 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my feedback is more that i don't think many devs are familiar with sql, and this uses more advanced features like window functions, CTEs, etc. -- i suspect having it as pure-python is more familiar for most

this also comes with added benefits for example when a map changes states, rather than setup complex trigger and handlers, taking care of any potential edge cases, statistics will be corrected next time this user submits a score.

i don't disagree w/ this, but it can be implemented in pure py as well

Copy link
Member

@cmyui cmyui Apr 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think the most important issue in recalc to fix would be recalculation of score statuses when scores of a user shift around

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my feedback is more that i don't think many devs are familiar with sql, and this uses more advanced features like window functions, CTEs, etc. -- i suspect having it as pure-python is more familiar for most

this also comes with added benefits for example when a map changes states, rather than setup complex trigger and handlers, taking care of any potential edge cases, statistics will be corrected next time this user submits a score.

i don't disagree w/ this, but it can be implemented in pure py as well

imo it's too cluttered for something which can be implemented in like 20-50 lines of python, and yeah sql can be confusing at times

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the perf & scalability trade off is worth the effort I believe.

it's a good point, let me consider it & review more deeply soon

Copy link
Member

@cmyui cmyui May 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the issue with memory usage you're having here is because it loads all best scores for a given mode at once, this can be fixed with OFFSET anyway

pagination is a good idea, as time-to-execute is less of a problem than resource utilization at large scale imo (from akatsuki experience) -- not sure but i'm not entirely remembering the technical considerations of this process (e.g. row/table locking w.r.t. transactions, requirements of upfront processing, etc.), so i'll have to refresh myself

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think the most important issue in recalc to fix would be recalculation of score statuses when scores of a user shift around

A PR for that already exists for quite some time. #573

I believe the simplicity of the current pure python way easily outweighs any performance gains from this. As someone that ran bancho.py at its limits, score submission was rather the least of issues.

Copy link
Contributor Author

@arily arily May 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think the most important issue in recalc to fix would be recalculation of score statuses when scores of a user shift around

we also have solution for this one as well. https://github.com/ppy-sb/nyamatrix/blob/main/nyamatrix/qb/update_score_status.py
This was built to reduce memory usage on our server during recalc. Now with built-in recalc script memory usage will go above 4GB, while running prod bpy. Our server (configured with 4c8g) don't have enough memory. New one uses 113M doing so.
This one uses sql as well so it might not be your choice of tech stack.

the issue with memory usage you're having here is because it loads all best scores for a given mode at once, this can be fixed with OFFSET anyway

There are several technical difficulties with this approach If I understand the OFFSET correctly.:

  • you still have to maintain a user-mode-beatmap-scores and user-mode-beatmap-best maps in order to decide which score should be set to best.
  • large offset in SQL is VERY SLOW thus modern apps and websites don't often give you ability to jump to precise pages anymore, and chosen cursor based infinity scrolling.

There's however a better way imo, to chunk user ids, as they don't interference each other.

@cmyui cmyui self-requested a review May 1, 2025 06:36
@cmyui cmyui assigned cmyui and arily May 1, 2025
@cmyui cmyui added performance Improvements to resource usage without changing functionality developer tooling labels May 1, 2025
@arily arily requested a review from infernalfire72 as a code owner May 2, 2025 07:07
@arily arily force-pushed the akat-master-0428-update-stat-sql branch from 24e064c to 57a5a0c Compare May 18, 2025 16:13
@arily arily force-pushed the akat-master-0428-update-stat-sql branch 3 times, most recently from 5333606 to b97c348 Compare August 3, 2025 18:43
@arily arily force-pushed the akat-master-0428-update-stat-sql branch from b97c348 to 2072268 Compare August 3, 2025 18:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

developer tooling performance Improvements to resource usage without changing functionality

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants