Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(maitake-sync): mutex-traits integration #482

Merged
merged 41 commits into from
Aug 10, 2024
Merged

feat(maitake-sync): mutex-traits integration #482

merged 41 commits into from
Aug 10, 2024

Conversation

hawkw
Copy link
Owner

@hawkw hawkw commented Jul 25, 2024

Currently, maitake-sync's blocking Mutex and RwLock use spinlocks
to wait for the lock to become available. This works on (almost) all
platforms,1 but there are a number of reasons it might be less than
desirable. For example:

  • You might want an IRQ-safe mutex so that data protected by the mutex
    can also be shared with interrupt handlers. In that case, the mutex
    must also disable IRQs when locked, and re-enable them when unlocked.
  • You might want to use an alternative mechanism for waiting, such as
    hardware lock elision, an OS mutex or something when not on
    #![no_std], or an existing library RTOS mutex or similar.
  • You might be on a strictly single-threaded platform where any
    "waiting" is inherently a deadlock, and you would prefer to make any
    attempt to lock an already-locked mutex into a panic instead.

Therefore, @jamesmunns and I have written the mutex-traits crate, to
abstract over mutex implementations. This trait is similar to
lock_api, but with a ScopedRawMutex trait to support mutexen based
on critical-section, which cannot be locked and unlocked at
arbitrary times and instead requires a closure.

This branch updates maitake-sync to use mutex-traits. In
particular:

  • The blocking Mutex type is generic over a Lock type parameter,
    which may implement mutex_traits::RawMutex or
    mutex_traits::ScopedRawMutex.
  • Mutex has scoped lock methods when Lock: ScopedRawMutex, and guard
    methods require Lock: RawMutex.
  • All async primitives that use a blocking Mutex are now generic over
    a Lock type for the underlying mutex. Everything except for
    Semaphore and RwLock only requires ScopedRawMutex, but
    Semaphore and RwLock require RawMutex.
  • By default, Mutexes are constructed with a DefaultMutex as the
    ScopedRawMutex impl, which tries to pick the right behavior based on
    features/target config. This only implements ScopedRawMutex.

BREAKING CHANGE:

Renamed spin::Mutex and spin::RwLock to blocking::Mutex and
blocking::RwLock.

blocking::Mutex::new now constructs a blocking::Mutex<T, DefaultMutex>,
which only provides scoped lock methods. To construct a
mutex that also has RAII guard lock methods, use
blocking::Mutex::new_with_raw_mutex to provide a RawMutex
implementation.

Footnotes

  1. Provided they have atomic CAS...

@hawkw
Copy link
Owner Author

hawkw commented Jul 25, 2024

@jamesmunns this is just about done, i basically just need to finish docs.

@jamesmunns
Copy link
Collaborator

Do you know what's up with the loom test failures? Is that just needing a rename?

@hawkw
Copy link
Owner Author

hawkw commented Jul 26, 2024

Do you know what's up with the loom test failures? Is that just needing a rename?

Yeah, I haven't actually updated the test-only code yet, I'll need to do that as well.

@@ -175,7 +177,7 @@ mod tests;
/// [mutex]: crate::Mutex
/// [2]: https://www.1024cores.net/home/lock-free-algorithms/queues/intrusive-mpsc-node-based-queue
#[derive(Debug)]
pub struct WaitQueue {
pub struct WaitQueue<Lock: ScopedRawMutex = Spinlock> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Misc thought: would it be possible to use a type alias and some feature detection to pick different default locks?

The reason I ask: on embedded, if someone doesn't KNOW to switch out the spinlock, they have a pretty hidden footgun: It will work unless they share stuff with interrupts AND have an interrupt occur while the lock is held.

I can see this being a MAJOR heisenbug for someone to run in to. My suggestion would be to default to the Critical Section mutex (this might be made tricky because it's a feature!) on target-os = "none" platforms, and a spinlock on others. Or maybe even not pick a default on those platforms? (this might mean duplicating the struct defn)

I don't know how to make the UX of this not suck, but "it works 99.9% of the time until it doesn't" for embedded platforms is rough.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jamesmunns I have conflicted feelings about this. There are some competing goals where, unfortunately, I think we have to make some tradeoffs...

In particular, I have a moderate preference for maitake-sync not containing or depending on platform-specific code. My goal thus far is that maitake-sync just be "plain old Rust" code and any platform-specific behaviors would be provided by the top-level application.

Furthermore, I have a strong preference that maitake-sync not do its own build-time platform detection. In my experience, this is tricky to get right, and I don't want to be responsible for maintaining it, especially when more robust and more maintained implementations exist elsewhere in the ecosystem. Additionally, there's the issue that sometimes it's not possible to detect a platform without additional user input --- for instance, there are single-core Cortex-M parts, and multi-core Cortex-M parts, and (to my knowledge), Rust can't determine which it's targeting based on just the target spec: instead, the user must set some additional RUSTFLAGS cfg or environment variable to indicate it. From what I've seen, there are a bunch of crates that all provide their own mechanism for indicating stuff like this, so users sometimes have to set a bunch of different env vars to indicate to various parts of their dependency graph that they are targeting a single-core platform. I'd really rather not add another flag you have to set.

Therefore, I'd much prefer the top-level application build to depend on the crate that does platform detection, and pass its types into maitake-sync. Or, if that's unworkable, I'd rather at least have maitake-sync depend on some other crate that does the platform detection.

Regarding critical-section support by default, hmm. I'm open to that, but I would really prefer not to make the default mutex type parameter for everything be the mutex::raw_impls::cs::CriticalSectionRawMutex type, since it only implements ScopedRawMutex, meaning that by default, the RAII mutex APIs won't be available, which seems sad.

An alternative option could be to bake the critical_section::acquire and critical_section::release functions into maitake_sync::blocking::Mutex itself, rather than the RawMutex type. That way, we can always ensure that IRQ disabling and enabling behaviors happen. But, this actually causes a problem when the critical section impl isn't just IRQ-disable/IRQ-restore, but also implements a real mutex, because now you have two mutices. I guess critical-section does require reentrancy, so it's not a deadlock, it just means you're doing extra stuff... It also kind of makes the critical-section impl in mutex unnecessary with maitake-sync, which is not the end of the world, but we might have to document that. I'd be open to consider this approach, as long as the critical-section dep is feature-flagged.

What do you think?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the record, I definitely would prefer to avoid footgun-y default behavior. However, I also kind of feel like platform-specific behaviors like this are usually also application-specific behavior, and it's hard to implement any default that really does what you want without some additional input from the end user. I dunno.

Another concern that I forgot to mention above is that I wanted to see if I could add mutex-traits support without a breaking change --- which means that the default new constructors need to behave identically to how they did before. They can't depend on critical-section in that case, since IIUC critical-section will fail to link if the user doesn't depend some critical section implementation. So, using critical-section by default would be a breaking change, and not having default parameters would also be a breaking change.

With that said, I'm open to releasing a v0.2 if it's necessary. I just wanted to see if I could get away with not doing that. This is not nearly as strong a preference as my desire to avoid having to maintain my own platform detection code!

@hawkw hawkw marked this pull request as ready for review July 27, 2024 19:03
@hawkw hawkw changed the title [WIP] feat(maitake-sync): mutex-traits integration feat(maitake-sync): mutex-traits integration Jul 27, 2024
@hawkw
Copy link
Owner Author

hawkw commented Jul 27, 2024

BTW @jamesmunns 146be31 adds a huge pile of documentation, which I would love your take on if you have time.

@jamesmunns
Copy link
Collaborator

jamesmunns commented Jul 27, 2024

@hawkw I reviewed 146be31 and it is very clear and well written!

I need to look at the total diff - but IMO we should have "chef's recommendations" for the mutexes you should use, or point to mutex/mutex-traits to recommend it.

These docs cover:

  • What spinlocks are and why you might want or not want to use them
  • The fact that the mutexes can be customized

But we might want to answer the question "which would should I use (in the most common use cases)".

@hawkw
Copy link
Owner Author

hawkw commented Jul 27, 2024

I need to look at the total diff - but IMO we should have "chef's recommendations" for the mutexes you should use, or point to mutex/mutex-traits to recommend it.

Yeah, I totally agree with this — but, ideally, I'd like to add that in the mutex crate, where the actual implementations live, and link to it here. That way, more people can benefit from that documentation, even if they aren't using maitake-sync.

Honesty, some of these docs should maybe be copy-pasted into Mutex, with some minor tweaks...

@hawkw
Copy link
Owner Author

hawkw commented Jul 27, 2024

Also, since I think we've agreed that maitake-sync should have a "best effort default ScopedRawMutex", the "chef's choice" docs will basically be "Use the default mutex, unless you know why you need to use something else."

@hawkw
Copy link
Owner Author

hawkw commented Jul 28, 2024

@jamesmunns when you have a chance, I'd love your take on the current DefaultMutex design!

/// ensures that bare-metal users who have enabled `critical-section` get a
/// mutex that disables IRQs when locked.
///
/// - **Otherwise, the `DefaultMutex` is a [`Spinlock`]**. This is the default
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we could poison this with a !Sync bound in this case? That would be enough to inhibit IRQ usage, but would also probably ruin your inter-core use case :/

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The whole point of a spinlock is to make things Sync...if that wasn't the goal, we could just make it panic. I wondered about possibly having some kind of IrqSafe marker trait, analogously to Send and Sync, but...there's no way to actually enforce it...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right. whoops.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the problem is that, although ISRs are "just another kind of concurrency" and can race in all the same ways as threads/multicore code, the things you have to do to prevent races are different.

Copy link
Collaborator

@jamesmunns jamesmunns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is all super good, and I am very excited for it!

maitake-sync/README.md Show resolved Hide resolved
maitake-sync/README.md Outdated Show resolved Hide resolved
maitake-sync/src/blocking.rs Show resolved Hide resolved
maitake-sync/src/blocking.rs Show resolved Hide resolved
maitake-sync/src/blocking/mutex.rs Show resolved Hide resolved
maitake-sync/src/semaphore.rs Outdated Show resolved Hide resolved
@hawkw
Copy link
Owner Author

hawkw commented Jul 29, 2024

@jamesmunns by the way, regarding constructors, do you have a strong opinion on Mutex::with_raw_mutex versus Mutex::new_with_raw_mutex? I think with_raw_mutex is more consistent with the stdlib (e.g. Vec::with_capacity, HashMap::with_hasher, etc), but new_with_... makes it a bit clearer that it's a constructor...

@jamesmunns
Copy link
Collaborator

No strong opinions! I think if we have some cross linking between the constructors, it's all okay.

Weakly, I prefer new_with_..., as we are already using with_lock for example to mean "within a scope" (like python's with context managers)

@hawkw
Copy link
Owner Author

hawkw commented Jul 29, 2024

Weakly, I prefer new_with_..., as we are already using with_lock for example to mean "within a scope" (like python's with context managers)

Yeah, that's a big part of why I was considering switching to new_with....

@hawkw
Copy link
Owner Author

hawkw commented Jul 29, 2024

Another thing I'm trying to make up my mind on is whether the Mutex::new constructor should always use DefaultMutex (which is what it does currently), or if it should instead be a type-inferred constructor that uses whatever the Lock type param is when it implements ConstInit.

Currently, Mutex::new is always DefaultMutex, which I did because if it's instead made generic, a Mutex::new without type annotations won't compile: for instance,

            let mutex = Arc::new(Mutex::new(String::new()));

gets you an error like:

error[E0283]: type annotations needed
   --> maitake-sync/src/blocking/mutex.rs:418:34
    |
418 |             let mutex = Arc::new(Mutex::new(String::new()));
    |                                  ^^^^^^^^^^ cannot infer type of the type parameter `Lock` declared on the struct `Mutex`
    |
    = note: cannot satisfy `_: ConstInit`
    = help: the following types implement trait `ConstInit`:
              DefaultMutex
              RwSpinlock
              Spinlock
note: required by a bound in `blocking::mutex::Mutex::<T, Lock>::new`
   --> maitake-sync/src/blocking/mutex.rs:87:15
    |
85  |     pub const fn new(data: T) -> Self
    |                  --- required by a bound in this associated function
86  |     where
87  |         Lock: mutex_traits::ConstInit,
    |               ^^^^^^^^^^^^^^^^^^^^^^^ required by this bound in `Mutex::<T, Lock>::new`
help: consider specifying the generic arguments
    |
418 |             let mutex = Arc::new(Mutex::<std::string::String, Lock>::new(String::new()));
    |          

Which is pretty unfortunate.

On the other hand, a Mutex often lives in a struct field or a static, where it has an implcit type annotation; if i write

struct Foo {
    t: Mutex<String>,
}

I am implicitly providing a type for Lock, the default. So, in that case, it will work.

Basically what I'm trying to decide is whether Mutex::new should be usable with other mutex types, and whether that's worth making it require a type annotation. I uncovered this after I started updating mycelium's other crates to use the new Mutex API in 2074ae9. In particular, I noticed that the spin::Mutex type alias I added for Mutex<T, Spinlock> can't be constructed with spin::Mutex::new, and requires an explicit Mutex::with_raw_mutex(t, Spinlock::new()), which...kinda defeats the purpose of the type alias.

The alternative might be to just remove the alias.

@jamesmunns
Copy link
Collaborator

maybe don't have a new, and have new_default() and new_custom::<T>()?

I don't have strong preferences either! I personally avoid default types (and default lifetimes!), because I find they can be confusing for people less familiar with the API surface. It's less clean, but maybe more "honest"?

hawkw added a commit that referenced this pull request Jul 29, 2024
As mentioned in [this comment][1], `new_with_raw_mutex` is probably a
better name for the constructor, because we also have `with_lock`, which
isn't a constructor. Also, it has the side benefit of being
autocomplete-friendly; it will be suggested if you type `new`, which
seems good.

[1]: #482 (comment)
@hawkw
Copy link
Owner Author

hawkw commented Jul 29, 2024

maybe don't have a new, and have new_default() and new_custom::<T>()?

I don't have strong preferences either! I personally avoid default types (and default lifetimes!), because I find they can be confusing for people less familiar with the API surface. It's less clean, but maybe more "honest"?

Hmm, I think that for all the async types, a new which uses DefaultMutex is almost always correct, so my preference is to keep that. For the blocking::Mutex type, it's a bit more complex, because the issue is that blocking::Mutex::new will never have the RAII interface, just the scoped one.

But, maybe what I should do is either remove the spin::Mutex alias, or make it a wrapper type instead, so that it can have a new. I think i'm just going to get rid of it for now; since this is already going to be a breaking change, we don't need to keep a Mutex type in that module. If we want one later, we can always add something.

hawkw and others added 18 commits August 10, 2024 08:27
maybe `new` should infer the lock type idk...
Co-authored-by: James Munns <[email protected]>
As mentioned in [this comment][1], `new_with_raw_mutex` is probably a
better name for the constructor, because we also have `with_lock`, which
isn't a constructor. Also, it has the side benefit of being
autocomplete-friendly; it will be suggested if you type `new`, which
seems good.

[1]: #482 (comment)
Loom depends on `tracing` anyway, so `tracing` is always in the
dependency tree when `cfg(loom)` is set. So, there isn't really any
sense in disabling `tracing` in `maitake-sync` when `cfg(loom)` is set,
even if the tracing feature isn't explicitly enabled.
@hawkw hawkw enabled auto-merge (squash) August 10, 2024 17:50
@hawkw hawkw merged commit 99da7e1 into main Aug 10, 2024
18 of 19 checks passed
@hawkw hawkw deleted the eliza/lock-api branch August 10, 2024 18:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants