Closed
Description
Today's run of the std test suite in Miri failed on macOS:
3.314536 test sync::rwlock::tests::frob ... error: Undefined Behavior: trying to retag from <42368726> for SharedReadOnly permission at alloc9742021[0x0], but that tag does not exist in the borrow stack for this location
0.000053 --> /home/runner/work/miri-test-libstd/miri-test-libstd/rust-src-patched/library/core/src/ptr/non_null.rs:401:18
0.000015 |
0.000009 401 | unsafe { &*self.as_ptr().cast_const() }
0.000010 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
0.000009 | |
0.000010 | trying to retag from <42368726> for SharedReadOnly permission at alloc9742021[0x0], but that tag does not exist in the borrow stack for this location
0.000008 | this error occurs as part of retag at alloc9742021[0x0..0x30]
0.000009 |
0.000009 = help: this indicates a potential bug in the program: it performed an invalid operation, but the Stacked Borrows rules it violated are still experimental
0.000009 = help: see https://github.com/rust-lang/unsafe-code-guidelines/blob/master/wip/stacked-borrows.md for further information
0.000010 help: <42368726> was created by a SharedReadOnly retag at offsets [0x0..0x8]
0.000009 --> std_miri_test/../library/std/src/sys/locks/rwlock/queue.rs:348:32
0.000009 |
0.000009 348 | let mut next = ptr::from_ref(&node)
0.000009 | ^^^^^^^^^^^^^^^^^^^^
0.000011 help: <42368726> was later invalidated at offsets [0x8..0x10] by a write access
0.000010 --> std_miri_test/../library/std/src/sys/locks/rwlock/queue.rs:347:17
0.000008 |
0.000009 347 | node.prev = AtomicLink::new(None);
0.000009 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
0.000009 = note: BACKTRACE (of the first span):
0.000009 = note: inside `core::ptr::NonNull::<sys::locks::rwlock::queue::Node>::as_ref::<'_>` at /home/runner/work/miri-test-libstd/miri-test-libstd/rust-src-patched/library/core/src/ptr/non_null.rs:401:18: 401:46
0.000009 note: inside `sys::locks::rwlock::queue::add_backlinks_and_find_tail`
0.000009 --> std_miri_test/../library/std/src/sys/locks/rwlock/queue.rs:265:17
0.000009 |
0.000009 265 | next.as_ref().prev.set(Some(current));
0.000009 | ^^^^^^^^^^^^^
0.000008 note: inside `sys::locks::rwlock::queue::RwLock::unlock_queue`
0.000009 --> std_miri_test/../library/std/src/sys/locks/rwlock/queue.rs:487:33
0.000009 |
0.000008 487 | let tail = unsafe { add_backlinks_and_find_tail(to_node(state)) };
0.000009 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
0.000009 note: inside `sys::locks::rwlock::queue::RwLock::lock_contended`
0.000009 --> std_miri_test/../library/std/src/sys/locks/rwlock/queue.rs:383:25
0.000009 |
0.000008 383 | self.unlock_queue(next);
0.000009 | ^^^^^^^^^^^^^^^^^^^^^^^
0.000009 note: inside `sys::locks::rwlock::queue::RwLock::write`
0.000010 --> std_miri_test/../library/std/src/sys/locks/rwlock/queue.rs:312:13
0.000009 |
0.000009 312 | self.lock_contended(true)
0.000009 | ^^^^^^^^^^^^^^^^^^^^^^^^^
0.000009 note: inside `sync::rwlock::RwLock::<()>::write`
0.000009 --> std_miri_test/../library/std/src/sync/rwlock.rs:361:13
0.000009 |
0.000009 361 | self.inner.write();
0.000009 | ^^^^^^^^^^^^^^^^^^
0.000010 note: inside closure
0.000011 --> std_miri_test/../library/std/src/sync/rwlock/tests.rs:37:26
0.000009 |
0.000010 37 | drop(r.write().unwrap());
0.000008 | ^^^^^^^^^
The diff to the last successful run is this, but since this is a concurrency test, there's a high chance that the bug already existed before and only surfaces randomly. So #110211 is the most likely culprit.
So far I haven't managed to reproduce this outside the context of the standard library test suite.
Activity
RalfJung commentedon Feb 26, 2024
Looking at the code, what seems to be happening is that one iteration through the loop we create a shared reference here
rust/library/std/src/sys/locks/rwlock/queue.rs
Lines 348 to 350 in b58f647
I am surprised that the diagnostic says "created by a SharedReadOnly retag at offsets [0x0..0x8]". This type does have interior mutability, so parts of the retag should be SharedReadWrite. This is the type in question:
rust/library/std/src/sys/locks/rwlock/queue.rs
Lines 179 to 187 in b58f647
Everything except the
bool
here is anUnsafeCell
. So how can there be 8 bytes of SharedReadOnly (i.e., outsideUnsafeCell
)?miri-test-libstd uses
-Zrandomize-layout
. Maybe that has something to do with it. Maybe that's why we couldn't reproduce it yet.RalfJung commentedon Feb 26, 2024
So with random layout, I guess the
bool
field comes first and then an 8-aligned field follows, meaning we have 8 bytes of "not interior mutable" memory at the start. So far, so good.Then apparently later it got invalidated at offsets 8 to 16 here
rust/library/std/src/sys/locks/rwlock/queue.rs
Line 347 in b58f647
So that pointer must be the second field.
Then later we are using the original shared reference again and 💥 .
I think this is morally equivalent to the following:
This is rejected by Miri with Stacked Borrows but accepted with Tree Borrows.
The "blessed" way to do this is to never mix direct accesses to a local with accesses through derived pointers. Fixing that in queue.rs shouldn't be too hard.
Auto merge of #3327 - RalfJung:tree-interior_mut_reborrow, r=RalfJung
RalfJung commentedon Feb 26, 2024
From what @joboet said, the actual bug is that there's anyone even still using the old reference when we are doing the write that does the invalidation.
According to the Miri log, we have these events in this order:
A reference gets created and turned into a raw pointer
rust/library/std/src/sys/locks/rwlock/queue.rs
Lines 348 to 350 in b58f647
The memory that reference points to is written to via direct access to the local variable
rust/library/std/src/sys/locks/rwlock/queue.rs
Line 347 in b58f647
Someone uses the pointer derived from the reference
rust/library/std/src/sys/locks/rwlock/queue.rs
Line 265 in b58f647
We have a full backtrace only for the last event.
Backtrace
apiraino commentedon Feb 26, 2024
AFAIK T-libs doesn't use the I-prioritize label, so probably fine to remove it.
@rustbot label -I-prioritize
RalfJung commentedon Feb 26, 2024
I did managed to reproduce this on my machine once with that flag (with seed 2852 on Miri bbabee144ea12b210a7be9488f0eeed9efe3e24d), but when I run the testcase again with that seed it doesn't happen again.
This is the test:
Full command I used:
So... it uses actual randomness, and thus of course it does not reproduce even with the same seed. :/ (Disabling isolation means we give the program access to host randomness, which will be used to initialize the
thread_rng
.)15 remaining items