Skip to content

Stacked Borrows violation in macOS RwLock #121626

Closed
@RalfJung

Description

@RalfJung

Today's run of the std test suite in Miri failed on macOS:

  3.314536   test sync::rwlock::tests::frob ... error: Undefined Behavior: trying to retag from <42368726> for SharedReadOnly permission at alloc9742021[0x0], but that tag does not exist in the borrow stack for this location
  0.000053      --> /home/runner/work/miri-test-libstd/miri-test-libstd/rust-src-patched/library/core/src/ptr/non_null.rs:401:18
  0.000015       |
  0.000009   401 |         unsafe { &*self.as_ptr().cast_const() }
  0.000010       |                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  0.000009       |                  |
  0.000010       |                  trying to retag from <42368726> for SharedReadOnly permission at alloc9742021[0x0], but that tag does not exist in the borrow stack for this location
  0.000008       |                  this error occurs as part of retag at alloc9742021[0x0..0x30]
  0.000009       |
  0.000009       = help: this indicates a potential bug in the program: it performed an invalid operation, but the Stacked Borrows rules it violated are still experimental
  0.000009       = help: see https://github.com/rust-lang/unsafe-code-guidelines/blob/master/wip/stacked-borrows.md for further information
  0.000010   help: <42368726> was created by a SharedReadOnly retag at offsets [0x0..0x8]
  0.000009      --> std_miri_test/../library/std/src/sys/locks/rwlock/queue.rs:348:32
  0.000009       |
  0.000009   348 |                 let mut next = ptr::from_ref(&node)
  0.000009       |                                ^^^^^^^^^^^^^^^^^^^^
  0.000011   help: <42368726> was later invalidated at offsets [0x8..0x10] by a write access
  0.000010      --> std_miri_test/../library/std/src/sys/locks/rwlock/queue.rs:347:17
  0.000008       |
  0.000009   347 |                 node.prev = AtomicLink::new(None);
  0.000009       |                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  0.000009       = note: BACKTRACE (of the first span):
  0.000009       = note: inside `core::ptr::NonNull::<sys::locks::rwlock::queue::Node>::as_ref::<'_>` at /home/runner/work/miri-test-libstd/miri-test-libstd/rust-src-patched/library/core/src/ptr/non_null.rs:401:18: 401:46
  0.000009   note: inside `sys::locks::rwlock::queue::add_backlinks_and_find_tail`
  0.000009      --> std_miri_test/../library/std/src/sys/locks/rwlock/queue.rs:265:17
  0.000009       |
  0.000009   265 |                 next.as_ref().prev.set(Some(current));
  0.000009       |                 ^^^^^^^^^^^^^
  0.000008   note: inside `sys::locks::rwlock::queue::RwLock::unlock_queue`
  0.000009      --> std_miri_test/../library/std/src/sys/locks/rwlock/queue.rs:487:33
  0.000009       |
  0.000008   487 |             let tail = unsafe { add_backlinks_and_find_tail(to_node(state)) };
  0.000009       |                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  0.000009   note: inside `sys::locks::rwlock::queue::RwLock::lock_contended`
  0.000009      --> std_miri_test/../library/std/src/sys/locks/rwlock/queue.rs:383:25
  0.000009       |
  0.000008   383 |                         self.unlock_queue(next);
  0.000009       |                         ^^^^^^^^^^^^^^^^^^^^^^^
  0.000009   note: inside `sys::locks::rwlock::queue::RwLock::write`
  0.000010      --> std_miri_test/../library/std/src/sys/locks/rwlock/queue.rs:312:13
  0.000009       |
  0.000009   312 |             self.lock_contended(true)
  0.000009       |             ^^^^^^^^^^^^^^^^^^^^^^^^^
  0.000009   note: inside `sync::rwlock::RwLock::<()>::write`
  0.000009      --> std_miri_test/../library/std/src/sync/rwlock.rs:361:13
  0.000009       |
  0.000009   361 |             self.inner.write();
  0.000009       |             ^^^^^^^^^^^^^^^^^^
  0.000010   note: inside closure
  0.000011      --> std_miri_test/../library/std/src/sync/rwlock/tests.rs:37:26
  0.000009       |
  0.000010   37  |                     drop(r.write().unwrap());
  0.000008       |                          ^^^^^^^^^

The diff to the last successful run is this, but since this is a concurrency test, there's a high chance that the bug already existed before and only surfaces randomly. So #110211 is the most likely culprit.

So far I haven't managed to reproduce this outside the context of the standard library test suite.

Activity

added
needs-triageThis issue may need triage. Remove it if it has been sufficiently triaged.
on Feb 26, 2024
RalfJung

RalfJung commented on Feb 26, 2024

@RalfJung
MemberAuthor

Looking at the code, what seems to be happening is that one iteration through the loop we create a shared reference here

let mut next = ptr::from_ref(&node)
.map_addr(|addr| addr | QUEUED | (state.addr() & LOCKED))
as State;

I am surprised that the diagnostic says "created by a SharedReadOnly retag at offsets [0x0..0x8]". This type does have interior mutability, so parts of the retag should be SharedReadWrite. This is the type in question:

#[repr(align(8))]
struct Node {
next: AtomicLink,
prev: AtomicLink,
tail: AtomicLink,
write: bool,
thread: OnceCell<Thread>,
completed: AtomicBool,
}

Everything except the bool here is an UnsafeCell. So how can there be 8 bytes of SharedReadOnly (i.e., outside UnsafeCell)?

miri-test-libstd uses -Zrandomize-layout. Maybe that has something to do with it. Maybe that's why we couldn't reproduce it yet.

RalfJung

RalfJung commented on Feb 26, 2024

@RalfJung
MemberAuthor

So with random layout, I guess the bool field comes first and then an 8-aligned field follows, meaning we have 8 bytes of "not interior mutable" memory at the start. So far, so good.

Then apparently later it got invalidated at offsets 8 to 16 here

node.prev = AtomicLink::new(None);

So that pointer must be the second field.
Then later we are using the original shared reference again and 💥 .

I think this is morally equivalent to the following:

use std::cell::UnsafeCell;

fn main() {
    let mut c = UnsafeCell::new(42);
    let ptr = c.get();
    c = UnsafeCell::new(13);
    unsafe { ptr.read(); }
}

This is rejected by Miri with Stacked Borrows but accepted with Tree Borrows.

The "blessed" way to do this is to never mix direct accesses to a local with accesses through derived pointers. Fixing that in queue.rs shouldn't be too hard.

added a commit that references this issue on Feb 26, 2024

Auto merge of #3327 - RalfJung:tree-interior_mut_reborrow, r=RalfJung

RalfJung

RalfJung commented on Feb 26, 2024

@RalfJung
MemberAuthor

From what @joboet said, the actual bug is that there's anyone even still using the old reference when we are doing the write that does the invalidation.

According to the Miri log, we have these events in this order:

A reference gets created and turned into a raw pointer

let mut next = ptr::from_ref(&node)
.map_addr(|addr| addr | QUEUED | (state.addr() & LOCKED))
as State;

The memory that reference points to is written to via direct access to the local variable

node.prev = AtomicLink::new(None);

Someone uses the pointer derived from the reference

next.as_ref().prev.set(Some(current));

We have a full backtrace only for the last event.

Backtrace
  0.000009   note: inside `sys::locks::rwlock::queue::add_backlinks_and_find_tail`
  0.000009      --> std_miri_test/../library/std/src/sys/locks/rwlock/queue.rs:265:17
  0.000009       |
  0.000009   265 |                 next.as_ref().prev.set(Some(current));
  0.000009       |                 ^^^^^^^^^^^^^
  0.000008   note: inside `sys::locks::rwlock::queue::RwLock::unlock_queue`
  0.000009      --> std_miri_test/../library/std/src/sys/locks/rwlock/queue.rs:487:33
  0.000009       |
  0.000008   487 |             let tail = unsafe { add_backlinks_and_find_tail(to_node(state)) };
  0.000009       |                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  0.000009   note: inside `sys::locks::rwlock::queue::RwLock::lock_contended`
  0.000009      --> std_miri_test/../library/std/src/sys/locks/rwlock/queue.rs:383:25
  0.000009       |
  0.000008   383 |                         self.unlock_queue(next);
  0.000009       |                         ^^^^^^^^^^^^^^^^^^^^^^^
  0.000009   note: inside `sys::locks::rwlock::queue::RwLock::write`
  0.000010      --> std_miri_test/../library/std/src/sys/locks/rwlock/queue.rs:312:13
  0.000009       |
  0.000009   312 |             self.lock_contended(true)
  0.000009       |             ^^^^^^^^^^^^^^^^^^^^^^^^^
  0.000009   note: inside `sync::rwlock::RwLock::<()>::write`
  0.000009      --> std_miri_test/../library/std/src/sync/rwlock.rs:361:13
  0.000009       |
  0.000009   361 |             self.inner.write();
  0.000009       |             ^^^^^^^^^^^^^^^^^^
  0.000010   note: inside closure
  0.000011      --> std_miri_test/../library/std/src/sync/rwlock/tests.rs:37:26
  0.000009       |
  0.000010   37  |                     drop(r.write().unwrap());
  0.000008       |                          ^^^^^^^^^
added
I-unsoundIssue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness
T-libsRelevant to the library team, which will review and decide on the PR/issue.
and removed
needs-triageThis issue may need triage. Remove it if it has been sufficiently triaged.
on Feb 26, 2024
added
I-prioritizeIssue: Indicates that prioritization has been requested for this issue.
on Feb 26, 2024
apiraino

apiraino commented on Feb 26, 2024

@apiraino
Contributor

AFAIK T-libs doesn't use the I-prioritize label, so probably fine to remove it.

@rustbot label -I-prioritize

removed
I-prioritizeIssue: Indicates that prioritization has been requested for this issue.
on Feb 26, 2024
RalfJung

RalfJung commented on Feb 26, 2024

@RalfJung
MemberAuthor

I did managed to reproduce this on my machine once with that flag (with seed 2852 on Miri bbabee144ea12b210a7be9488f0eeed9efe3e24d), but when I run the testcase again with that seed it doesn't happen again.

This is the test:

use rand::Rng;
use std::sync::mpsc::channel;
use std::sync::{Arc, RwLock};
use std::thread;

fn main() {
    const N: u32 = 10;
    const M: usize = if cfg!(miri) { 100 } else { 1000 };

    let r = Arc::new(RwLock::new(()));

    let (tx, rx) = channel::<()>();
    for _ in 0..N {
        let tx = tx.clone();
        let r = r.clone();
        thread::spawn(move || {
            let mut rng = rand::thread_rng();
            for _ in 0..M {
                if rng.gen_bool(1.0 / (N as f64)) {
                    drop(r.write().unwrap());
                } else {
                    drop(r.read().unwrap());
                }
            }
            drop(tx);
        });
    }
    drop(tx);
    let _ = rx.recv();
}

Full command I used:

CARGO_EXTRA_FLAGS="--profile=release" MIRI_SEED_START=2189 MIRI_SEED_END=4096 MIRIFLAGS="-Zrandomize-layout -Zmiri-strict-provenance -Zmiri-disable-isolation" ./miri many-seeds ./miri run --dep rwlock.rs --target aarch64-apple-darwin

So... it uses actual randomness, and thus of course it does not reproduce even with the same seed. :/ (Disabling isolation means we give the program access to host randomness, which will be used to initialize the thread_rng.)

15 remaining items

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    I-unsoundIssue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/SoundnessT-libsRelevant to the library team, which will review and decide on the PR/issue.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      Participants

      @RalfJung@apiraino@joboet@jieyouxu@rustbot

      Issue actions

        Stacked Borrows violation in macOS RwLock · Issue #121626 · rust-lang/rust