Skip to content

perf: Relax atomic allocated ops in Table and shrink page size #766

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

Veykril
Copy link
Member

@Veykril Veykril commented Mar 20, 2025

These are already paired with a lock-access on write. We also never store more than 10 bits of entries per page, so we can swap the usize with a u16 to shrink the page overhead.

@Veykril
Copy link
Member Author

Veykril commented Mar 20, 2025

@davidbarsky here is your reminder for this for your loom work

Copy link

netlify bot commented Mar 20, 2025

Deploy Preview for salsa-rs canceled.

Name Link
🔨 Latest commit f6cdb3d
🔍 Latest deploy log https://app.netlify.com/projects/salsa-rs/deploys/684bb464675aa80008829a60

Copy link

codspeed-hq bot commented Mar 20, 2025

CodSpeed Performance Report

Merging #766 will not alter performance

Comparing Veykril:veykril/push-qryuyllyzoll (f6cdb3d) with master (6ced42b)

Summary

✅ 12 untouched benchmarks

@Veykril Veykril force-pushed the veykril/push-qryuyllyzoll branch 2 times, most recently from eeb899a to 0b2a614 Compare March 26, 2025 06:29
@Veykril Veykril force-pushed the veykril/push-qryuyllyzoll branch from 0b2a614 to 45eaab3 Compare May 26, 2025 11:09
@Veykril Veykril marked this pull request as ready for review May 26, 2025 11:09
@ibraheemdev
Copy link
Contributor

ibraheemdev commented May 27, 2025

just shuttle passes

I don't think shuttle actually models weak atomics. We could run Miri with --many-seeds to get more coverage of this.

@Veykril
Copy link
Member Author

Veykril commented Jun 12, 2025

Ah I see, a bummer. According to miri docs the default is already at a high, slow 64. Unsure if increasing it is worth it? Is my reasoning for relaxing the loads here correct either way? I think it is but I'd rather get a second opinion of course.

@ibraheemdev
Copy link
Contributor

Ah I see, a bummer. According to miri docs the default is already at a high, slow 64

I think that's the default if you pass the flag without a value. A plan cargo miri run will only use one seed.

Copy link
Contributor

@ibraheemdev ibraheemdev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this is correct if something like slots_of is used concurrently? I know that's only exposed under salsa_unstable, but the code is currently written such that slots can be accessed across threads (the functions return a slice, not just the specific ID). If we wanted to relax the loads I think we should be clearer about only supporting point-access for previously synchronized slots.

The allocation lock already enforces a happens-before relationship for the initialized length
@Veykril
Copy link
Member Author

Veykril commented Jun 13, 2025

Hmm, is that a problem? The allocated count only ever increments and we never reallocate the data. allocated here is more of a misnomer I think, what it actually means is slots initialized, so a slice here will never point to uninitialized elements.

You mean the slots_of function here I believe is that right?

We do not use more than 10 bits for counts per page anyways, this shrinks the page overhead by 1 word
@Veykril Veykril force-pushed the veykril/push-qryuyllyzoll branch from 45eaab3 to f6cdb3d Compare June 13, 2025 05:17
@Veykril Veykril changed the title Relax atomic loads and stores in Table perf: Relax atomic allocated ops in Table and shrink page size Jun 13, 2025
@ibraheemdev
Copy link
Contributor

ibraheemdev commented Jun 13, 2025

allocated is the flag for a slot being initialized and establishes happens-before, e.g. if i < allocated.load(Acquire) { *slot.get(i) }. This matters if you are reading a slot that was initialized by a different thread, which I believe can only happen with the slots_of function. Otherwise, most slots are tied to a specific ingredient Id which must be allocated by the current-thread. However, even apart from slots_of, this invariant is not clear in the code so I would be wary of relaxing the load.

@Veykril
Copy link
Member Author

Veykril commented Jun 13, 2025

fair enough

@Veykril Veykril closed this Jun 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants