Skip to content

Replace ingredient cache with faster ingredient map #921

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

ibraheemdev
Copy link
Contributor

An alternative to #919, this removes the ingredient cache entirely, removing the gap between single and multi-database use cases. A TypeId is already a hash internally, so the ingredient lookup can become a very fast lookup in a lock-free map like papaya. The one annoying part is that we now have to run zalsa_register_downcaster every time a tracked function is called because we don't have a good way of caching that call independent of the database. I'm interested in the benchmark results here.

Copy link

netlify bot commented Jun 20, 2025

Deploy Preview for salsa-rs canceled.

Name Link
🔨 Latest commit 4f627b2
🔍 Latest deploy log https://app.netlify.com/projects/salsa-rs/deploys/6854d3216ab1710008ebb64e

Copy link

codspeed-hq bot commented Jun 20, 2025

CodSpeed Performance Report

Merging #921 will degrade performances by 10.4%

Comparing ibraheemdev:ibraheem/remove-ingredient-cache (4f627b2) with master (87a730f)

Summary

❌ 4 regressions
✅ 8 untouched benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

Benchmark BASE HEAD Change
amortized[Input] 3.2 µs 3.6 µs -9.49%
amortized[InternedInput] 3.2 µs 3.5 µs -10.4%
amortized[SupertypeInput] 3.7 µs 4 µs -8.04%
new[InternedInput] 5.3 µs 5.8 µs -8.56%

@ibraheemdev
Copy link
Contributor Author

ibraheemdev commented Jun 20, 2025

Hmm looks like this may not be good enough... I guess we could keep the IngredientCache and take the 10% hit for the multiple-database path. 10% seems a lot better than #919 at least (which was ~50%).

Copy link
Contributor

@MichaReiser MichaReiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great and I prefer it over the other PR not just because it's much faster but also because it doesn't require using raw-api.

I do think it makes sense to keep the secondary. A 10% regression for the most common case seems a lot.

Main

single

ty_walltime     fastest       │ slowest       │ median        │ mean          │ samples │ iters
╰─ small                      │               │               │               │         │
   ╰─ pydantic  296.6 ms      │ 305.5 ms      │ 299.5 ms      │ 300.5 ms      │ 3       │ 3

multi

ty_walltime     fastest       │ slowest       │ median        │ mean          │ samples │ iters
╰─ small                      │               │               │               │         │
   ╰─ pydantic  69.42 ms      │ 612.2 ms      │ 605.8 ms      │ 429.1 ms      │ 3       │ 3

This PR

single

ty_walltime     fastest       │ slowest       │ median        │ mean          │ samples │ iters
╰─ small                      │               │               │               │         │
   ╰─ pydantic  303.8 ms      │ 322.4 ms      │ 305.9 ms      │ 310.7 ms      │ 3       │ 3

multi

ty_walltime     fastest       │ slowest       │ median        │ mean          │ samples │ iters
╰─ small                      │               │               │               │         │
   ╰─ pydantic  55.83 ms      │ 79.95 ms      │ 57.73 ms      │ 64.5 ms       │ 3       │ 3

Overall: The multi-threading regression is now about the same (~10%) as the single threaded regression when using multiple databases

@MichaReiser
Copy link
Contributor

Hmm, it does seem that shuttle got stuck somewhere....

@ibraheemdev
Copy link
Contributor Author

because it doesn't require using raw-api

The other PR doesn't actually need the raw-api, that just made it easier to make the shuttle shim.. it looks like we'll need a shuttle shim for this one as well. I now realized the shim can just return a copy of the value instead of having to mimic the guard API.

@ibraheemdev
Copy link
Contributor Author

It might also be worth adding specific benchmarks to compare running with a second database, because right now (for ty_walltime) we are assuming that the fastest run is the one with the first database. At least on my machine it wasn't clear that was the case; the gap was a lot closer and running into noise.

@MichaReiser
Copy link
Contributor

Did you change the sample count to 1?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants