Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What are the preconditions for (overlapping) ptr::copy? #549

Open
steffahn opened this issue Dec 17, 2024 · 10 comments · May be fixed by rust-lang/rust#134606
Open

What are the preconditions for (overlapping) ptr::copy? #549

steffahn opened this issue Dec 17, 2024 · 10 comments · May be fixed by rust-lang/rust#134606

Comments

@steffahn
Copy link
Member

steffahn commented Dec 17, 2024

From the docs:

Safety

Behavior is undefined if any of the following conditions are violated:

  • src must be valid for reads of count * size_of::<T>()> bytes, and must remain valid even when dst is written for count * size_of::<T>() bytes. (This means if the memory ranges overlap, the two pointers must not be subject to aliasing restrictions relative to each other.)
  • dst must be valid for writes of count * size_of::<T>()> bytes, and must remain valid even when src is read for count * size_of::<T>() bytes.
  • Both src and dst must be properly aligned.

I read this first requirement of saying that after writes to dst, reading from src must still be possible, even in the positions that have been written to… it says “when dst is written for count * size_of::<T>() bytes”, which should mean “when the whole range of dst has [possibly] been written to”, right?

Of course, the docs also say:

Copying takes place as if the bytes were copied from src to a temporary array and then copied from the array to dst.

Which when taken literally could be interpreted to say - in particular - that each place is only ever written to via dst after it has been read via src. But I - as a reader - would actually understand this particular sentence to be only about the question of "which bytes are in what places at the end of this" (because I guess otherwise it's also reasonable to assume that a copy operation might only 'technically' allow overlap to reduce UB, but still mix up the order of the data/elements in that case).

As far as what I can (currently) observe in miri, the documented precondition is not checked.

fn f1() {
    let mut_data: *mut [u8] = &mut [0_u8; 100];
    let borrow: &[u8] = unsafe { &(*mut_data)[..] };
    let const_data: *const [u8] = borrow;

    let (mut_data_ptr, const_data_ptr) = (mut_data as *mut u8, const_data as *const u8);

    // now, writes to `mut_data_ptr` CAN make `const_data_ptr` invalid for reads:

    unsafe {
        mut_data_ptr.add(42).write(1);
        const_data_ptr.add(32).read(); // different position is okay,
        const_data_ptr.add(42).read(); // same position: miri says UB
    }
}

fn f2() {
    let mut_data: *mut [u8] = &mut [0_u8; 100];
    let borrow: &[u8] = unsafe { &(*mut_data)[..] };
    let const_data: *const [u8] = borrow;

    let (mut_data_ptr, const_data_ptr) = (mut_data as *mut u8, const_data as *const u8);

    // but if we call `ptr::copy`, there's no issues?

    unsafe {
        std::ptr::copy(const_data_ptr.add(10), mut_data_ptr, 30);
        std::ptr::copy(const_data_ptr.add(40), mut_data_ptr.add(50), 30);
    }

    // Note that `ptr::copy` documents the following:
    // > `src` must be valid for reads of `count * size_of::<T>()` bytes,
    // > and must remain valid even when `dst` is written
    // > for `count * size_of::<T>()` bytes
}

fn main() {
    f2();
    // f1();
}

(playground)

So, either the precondition is wrong, or miri is wrong, or I'm misinterpreting something, or the behavior isn't fully defined yet and std docs make conservative restrictions for future compat, whilst miri conservatively allows stuff that isn't "actual" UB today.

Perhaps someone here knows more context?

AFAICT, given that the copy itself actually accesses all the memory/bytes in question anyways, this can't be a case where miri is simply unable to enforce the documented precondition efficiently. [Unlike many cases of documented library-UB for instance.]

@CAD97
Copy link

CAD97 commented Dec 18, 2024

It's perhaps somewhat relevant to note that TB's "3 phase" mutable borrows would make the question moot, as reading would still be permitted.

IIRC, miri implements ptr::copy as literally the fallback spec of copying to a temporary. Adding a "fake" read of src would implement the more conservative spec.

@RalfJung
Copy link
Member

Miri currently indeed first triggers a full read of the src range, then a full write of the dst range. We could probably trigger another read of the src range afterwards and hopefully that is enough to cover all permutations?

It's perhaps somewhat relevant to note that TB's "3 phase" mutable borrows would make the question moot, as reading would still be permitted.

The writes to dst can invalidate reading from src, though.

@RalfJung
Copy link
Member

Read-write-read is actually not enough, since dst may still be "reserved" during the first read, then it becomes "unique" during the write, and then "frozen" during the read... so we may need read-write-read-write to ensure we detect all these cases. "write-read-write" might also do it, but sounds like an odd sequence of operations.

@JoJoDeveloping is there some theorem for how often we have to alternative to be sure TB has reached a steady state?

@JoJoDeveloping
Copy link

JoJoDeveloping commented Dec 18, 2024

a safe upper bound is as many as there are states in the TB machine. But here probably it's write, followed by foreign actions, followed by write. This is because a Write makes you either Unique or it's UB.

On a more general question, why is this necessary? The specification here seems overly restrictive, there is no sensible implementation that reads a memory cell via src after writing to it via dst, because why would you do that? Similarly why would you write to dst before reading from src, the point of ptr::copy is precisely not to do that.

@RalfJung
Copy link
Member

If you copy naively in a loop, and the ranges overlap partially, you might first write via dst and then read via src, no? For instance consider src to be at offset 0, dst at offset 5, and the size is 10. The cell at offset 5 is first written to via dst (when we copy element 0), and then later read via src (when we copy element 5).

@Lokathor
Copy link
Contributor

yeah if they overlap you have to copy the spans in reverse, depending on the relative positions of src and dst. that's memmove basics.

@JoJoDeveloping
Copy link

JoJoDeveloping commented Dec 18, 2024

The cell at offset 5 is first written to via dst (when we copy element 0), and then later read via src (when we copy element 5).

No, because then you would read the new value, which is precisely what memmove (ptr::copy) is not supposed to do. It's not memcpy (ptr::copy_nonoverlapping) after all, where overlaps are UB because such things would occur. As mentioned, there's a check for this overlap in which case things happen in reverse.

@RalfJung
Copy link
Member

Ah right I guess we are specifying that we don't see the odd copying artifacts, fair.

Copying takes place as if the bytes were copied from src to a temporary array and then copied from the array to dst.

I mixed this up with ptr::swap which can have odd artifacts like that.

@JoJoDeveloping
Copy link

Even for ptr::swap, the specification tells you pretty precisely which copying artifacts you can have (it's still wrong for the implementation to naively read from one pointer after having written to the other). And it also much more strict about the aliasing preconditions, even though miri also does not go searching for a fixed point there.

@RalfJung
Copy link
Member

rust-lang/rust#134606 fixes the copy docs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants