Skip to content

Shard jar map #919

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

ibraheemdev
Copy link
Contributor

@ibraheemdev ibraheemdev commented Jun 18, 2025

Replaces the jar map with a DashMap. This depends on ibraheemdev/boxcar#35, which allows us to claim multiple sequential ingredient indices without holding a global lock.

This should also help with #918, because the slow path for accessing an uncached ingredient now goes through a sharded DashMap instead of a single Mutex, but I don't think we have benchmarks that use multiple databases yet (I'll add some). The ideal fix for that issue might also be to extend the IngredientCache, but this should at least help. We could also look into using papaya here (or more extreme, ArcSwap?), because the jar map is almost purely read-heavy after the table is initially filled.

Copy link

netlify bot commented Jun 18, 2025

Deploy Preview for salsa-rs canceled.

Name Link
🔨 Latest commit 6ff8f84
🔍 Latest deploy log https://app.netlify.com/projects/salsa-rs/deploys/6854d3a80f405200089cbccb

@ibraheemdev ibraheemdev force-pushed the ibraheem/ingredient-cache branch 2 times, most recently from 2de993f to 5d6cafe Compare June 18, 2025 22:32
Copy link

codspeed-hq bot commented Jun 18, 2025

CodSpeed Performance Report

Merging #919 will degrade performances by 4.37%

Comparing ibraheemdev:ibraheem/ingredient-cache (6ff8f84) with master (87a730f)

Summary

❌ 1 regressions
✅ 11 untouched benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

Benchmark BASE HEAD Change
new[InternedInput] 5.3 µs 5.6 µs -4.37%

@ibraheemdev
Copy link
Contributor Author

The benchmarks are probably misleading here because they're single threaded.

@ibraheemdev ibraheemdev force-pushed the ibraheem/ingredient-cache branch from 5d6cafe to dd31360 Compare June 18, 2025 23:38
@ibraheemdev ibraheemdev force-pushed the ibraheem/ingredient-cache branch from dd31360 to b21f69f Compare June 18, 2025 23:38
@MichaReiser
Copy link
Contributor

MichaReiser commented Jun 19, 2025

This should also help with #918, because the slow path for accessing an uncached ingredient now goes through a sharded DashMap instead of a single Mutex, but I don't think we have benchmarks that use multiple databases yet (I'll add some). The ideal fix for that issue might also be to extend the IngredientCache, but this should at least help. We could also look into using papaya here (or more extreme, ArcSwap?), because the jar map is almost purely read-heavy after the table is initially filled.

Did you try changing the ty benchmark by removing the thread pool override in the setup here:

https://github.com/astral-sh/ruff/blob/e352a50b74329268589cdc18eafab123832559ac/crates/ruff_benchmark/benches/ty_walltime.rs#L240-L250

This should be a very good benchmark to show the impact of this change

Edit 1

I made the change real quick and here are the results:

main

single threaded

ty_walltime     fastest       │ slowest       │ median        │ mean          │ samples │ iters
╰─ small                      │               │               │               │         │
   ╰─ pydantic  304.6 ms      │ 327 ms        │ 309.6 ms      │ 313.7 ms      │ 3       │ 6

multi threaded

ty_walltime     fastest       │ slowest       │ median        │ mean          │ samples │ iters
╰─ small                      │               │               │               │         │
   ╰─ pydantic  342 ms        │ 592.5 ms      │ 584.8 ms      │ 506.4 ms      │ 3       │ 6

PR

single threaded

ty_walltime     fastest       │ slowest       │ median        │ mean          │ samples │ iters
╰─ small                      │               │               │               │         │
   ╰─ pydantic  298.9 ms      │ 315.7 ms      │ 303.1 ms      │ 305.9 ms      │ 3       │ 6

multi threaded

ty_walltime     fastest       │ slowest       │ median        │ mean          │ samples │ iters
╰─ small                      │               │               │               │         │
   ╰─ pydantic  140 ms        │ 181.8 ms      │ 176.1 ms      │ 166 ms        │ 3       │ 6

The first run is still significantly faster (which is expected), but it's at least no longer the case that multithreaded is slower than single-threaded! I'm not quite sure why I can't observe the 10x slowdown anymore on main.

Edit 2

Wait, it now makes sense. I need to reduce the sample size to 1 or divan takes the median from two runs;

Main

single

ty_walltime     fastest       │ slowest       │ median        │ mean          │ samples │ iters
╰─ small                      │               │               │               │         │
   ╰─ pydantic  295.9 ms      │ 341.8 ms      │ 304.1 ms      │ 313.9 ms      │ 3       │ 3

multi

ty_walltime     fastest       │ slowest       │ median        │ mean          │ samples │ iters
╰─ small                      │               │               │               │         │
   ╰─ pydantic  92.33 ms      │ 582 ms        │ 581.3 ms      │ 418.5 ms      │ 3       │ 3

This PR

single

ty_walltime     fastest       │ slowest       │ median        │ mean          │ samples │ iters
╰─ small                      │               │               │               │         │
   ╰─ pydantic  325.7 ms      │ 342.1 ms      │ 326.9 ms      │ 331.6 ms      │ 3       │ 3

multi

ty_walltime     fastest       │ slowest       │ median        │ mean          │ samples │ iters
╰─ small                      │               │               │               │         │
   ╰─ pydantic  96.58 ms      │ 173.9 ms      │ 161.7 ms      │ 144.1 ms      │ 3       │ 3

So this is pretty good. It reduces the 5-6x slow down to a 1.5-2x slow down. I'm still not sure if that's good enough for us to enable multi threading in benchmarks because it probably adds too much overhead but this is a huge improvement in production when running with more than one db.

Copy link
Contributor

@MichaReiser MichaReiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent! It would be nice to reduce this even further (e.g. could we use a lock free map that's optimized for reads?) but this is a great first step.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants