What are the preconditions for (overlapping) `ptr::copy`? #549

steffahn · 2024-12-17T22:32:30Z

From the docs:

Safety

Behavior is undefined if any of the following conditions are violated:

src must be valid for reads of count * size_of::<T>()> bytes, and must remain valid even when dst is written for count * size_of::<T>() bytes. (This means if the memory ranges overlap, the two pointers must not be subject to aliasing restrictions relative to each other.)

dst must be valid for writes of count * size_of::<T>()> bytes, and must remain valid even when src is read for count * size_of::<T>() bytes.

Both src and dst must be properly aligned.

I read this first requirement of saying that after writes to dst, reading from src must still be possible, even in the positions that have been written to… it says “when dst is written for count * size_of::<T>() bytes”, which should mean “when the whole range of dst has [possibly] been written to”, right?

Of course, the docs also say:

Copying takes place as if the bytes were copied from src to a temporary array and then copied from the array to dst.

Which when taken literally could be interpreted to say - in particular - that each place is only ever written to via dst after it has been read via src. But I - as a reader - would actually understand this particular sentence to be only about the question of "which bytes are in what places at the end of this" (because I guess otherwise it's also reasonable to assume that a copy operation might only 'technically' allow overlap to reduce UB, but still mix up the order of the data/elements in that case).

As far as what I can (currently) observe in miri, the documented precondition is not checked.

fn f1() {
    let mut_data: *mut [u8] = &mut [0_u8; 100];
    let borrow: &[u8] = unsafe { &(*mut_data)[..] };
    let const_data: *const [u8] = borrow;

    let (mut_data_ptr, const_data_ptr) = (mut_data as *mut u8, const_data as *const u8);

    // now, writes to `mut_data_ptr` CAN make `const_data_ptr` invalid for reads:

    unsafe {
        mut_data_ptr.add(42).write(1);
        const_data_ptr.add(32).read(); // different position is okay,
        const_data_ptr.add(42).read(); // same position: miri says UB
    }
}

fn f2() {
    let mut_data: *mut [u8] = &mut [0_u8; 100];
    let borrow: &[u8] = unsafe { &(*mut_data)[..] };
    let const_data: *const [u8] = borrow;

    let (mut_data_ptr, const_data_ptr) = (mut_data as *mut u8, const_data as *const u8);

    // but if we call `ptr::copy`, there's no issues?

    unsafe {
        std::ptr::copy(const_data_ptr.add(10), mut_data_ptr, 30);
        std::ptr::copy(const_data_ptr.add(40), mut_data_ptr.add(50), 30);
    }

    // Note that `ptr::copy` documents the following:
    // > `src` must be valid for reads of `count * size_of::<T>()` bytes,
    // > and must remain valid even when `dst` is written
    // > for `count * size_of::<T>()` bytes
}

fn main() {
    f2();
    // f1();
}

(playground)

So, either the precondition is wrong, or miri is wrong, or I'm misinterpreting something, or the behavior isn't fully defined yet and std docs make conservative restrictions for future compat, whilst miri conservatively allows stuff that isn't "actual" UB today.

Perhaps someone here knows more context?

AFAICT, given that the copy itself actually accesses all the memory/bytes in question anyways, this can't be a case where miri is simply unable to enforce the documented precondition efficiently. [Unlike many cases of documented library-UB for instance.]

The text was updated successfully, but these errors were encountered:

CAD97 · 2024-12-18T06:28:56Z

It's perhaps somewhat relevant to note that TB's "3 phase" mutable borrows would make the question moot, as reading would still be permitted.

IIRC, miri implements ptr::copy as literally the fallback spec of copying to a temporary. Adding a "fake" read of src would implement the more conservative spec.

RalfJung · 2024-12-18T07:03:39Z

Miri currently indeed first triggers a full read of the src range, then a full write of the dst range. We could probably trigger another read of the src range afterwards and hopefully that is enough to cover all permutations?

It's perhaps somewhat relevant to note that TB's "3 phase" mutable borrows would make the question moot, as reading would still be permitted.

The writes to dst can invalidate reading from src, though.

RalfJung · 2024-12-18T10:07:19Z

Read-write-read is actually not enough, since dst may still be "reserved" during the first read, then it becomes "unique" during the write, and then "frozen" during the read... so we may need read-write-read-write to ensure we detect all these cases. "write-read-write" might also do it, but sounds like an odd sequence of operations.

@JoJoDeveloping is there some theorem for how often we have to alternative to be sure TB has reached a steady state?

JoJoDeveloping · 2024-12-18T11:59:30Z

a safe upper bound is as many as there are states in the TB machine. But here probably it's write, followed by foreign actions, followed by write. This is because a Write makes you either Unique or it's UB.

On a more general question, why is this necessary? The specification here seems overly restrictive, there is no sensible implementation that reads a memory cell via src after writing to it via dst, because why would you do that? Similarly why would you write to dst before reading from src, the point of ptr::copy is precisely not to do that.

RalfJung · 2024-12-18T12:21:18Z

If you copy naively in a loop, and the ranges overlap partially, you might first write via dst and then read via src, no? For instance consider src to be at offset 0, dst at offset 5, and the size is 10. The cell at offset 5 is first written to via dst (when we copy element 0), and then later read via src (when we copy element 5).

Lokathor · 2024-12-18T12:26:24Z

yeah if they overlap you have to copy the spans in reverse, depending on the relative positions of src and dst. that's memmove basics.

JoJoDeveloping · 2024-12-18T12:30:04Z

The cell at offset 5 is first written to via dst (when we copy element 0), and then later read via src (when we copy element 5).

No, because then you would read the new value, which is precisely what memmove (ptr::copy) is not supposed to do. It's not memcpy (ptr::copy_nonoverlapping) after all, where overlaps are UB because such things would occur. As mentioned, there's a check for this overlap in which case things happen in reverse.

RalfJung · 2024-12-18T12:30:39Z

Ah right I guess we are specifying that we don't see the odd copying artifacts, fair.

Copying takes place as if the bytes were copied from src to a temporary array and then copied from the array to dst.

I mixed this up with ptr::swap which can have odd artifacts like that.

JoJoDeveloping · 2024-12-18T12:35:41Z

Even for ptr::swap, the specification tells you pretty precisely which copying artifacts you can have (it's still wrong for the implementation to naively read from one pointer after having written to the other). And it also much more strict about the aliasing preconditions, even though miri also does not go searching for a fixed point there.

RalfJung · 2024-12-21T07:35:47Z

rust-lang/rust#134606 fixes the copy docs.

RalfJung linked a pull request Dec 21, 2024 that will close this issue

ptr::copy: fix docs for the overlapping case rust-lang/rust#134606

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What are the preconditions for (overlapping) `ptr::copy`? #549

What are the preconditions for (overlapping) `ptr::copy`? #549

steffahn commented Dec 17, 2024 •

edited

Loading

Safety

CAD97 commented Dec 18, 2024

RalfJung commented Dec 18, 2024

RalfJung commented Dec 18, 2024

JoJoDeveloping commented Dec 18, 2024 •

edited

Loading

RalfJung commented Dec 18, 2024

Lokathor commented Dec 18, 2024

JoJoDeveloping commented Dec 18, 2024 •

edited

Loading

RalfJung commented Dec 18, 2024

JoJoDeveloping commented Dec 18, 2024

RalfJung commented Dec 21, 2024

What are the preconditions for (overlapping) ptr::copy? #549

What are the preconditions for (overlapping) ptr::copy? #549

Comments

steffahn commented Dec 17, 2024 • edited Loading

Safety

CAD97 commented Dec 18, 2024

RalfJung commented Dec 18, 2024

RalfJung commented Dec 18, 2024

JoJoDeveloping commented Dec 18, 2024 • edited Loading

RalfJung commented Dec 18, 2024

Lokathor commented Dec 18, 2024

JoJoDeveloping commented Dec 18, 2024 • edited Loading

RalfJung commented Dec 18, 2024

JoJoDeveloping commented Dec 18, 2024

RalfJung commented Dec 21, 2024

What are the preconditions for (overlapping) `ptr::copy`? #549

What are the preconditions for (overlapping) `ptr::copy`? #549

steffahn commented Dec 17, 2024 •

edited

Loading

JoJoDeveloping commented Dec 18, 2024 •

edited

Loading

JoJoDeveloping commented Dec 18, 2024 •

edited

Loading