Skip to content

Add copy_file_range syscall for Linux #2592

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

Add copy_file_range syscall for Linux #2592

wants to merge 1 commit into from

Conversation

ghost
Copy link

@ghost ghost commented May 30, 2019

Here is the code I used to test this:

const std = @import("std");
const os = std.os;
const File = std.fs.File;
const assert = std.debug.assert;

test "copy_file_range using offsets" {
    const src_file = try createInputFile();
    defer src_file.close();
    const dst_file = try File.openWrite("/tmp/out_file_1");
    defer dst_file.close();

    var off_in: isize = 200;
    var off_out: isize = 15;
    const len: usize = 100;
    const flags: u32 = 0;
    const result = os.linux.copy_file_range(src_file.handle, &off_in, dst_file.handle, &off_out, len, flags);
    const errno = os.linux.getErrno(result);

    assert(errno == 0);
    assert(result == 256 - 200);
    assert(off_in == 256);
    assert(off_out == 15 + (256 - 200));
}

test "copy_file_range using seek positions" {
    const src_file = try createInputFile();
    defer src_file.close();

    const dst_file = try File.openWrite("/tmp/out_file_2");
    defer dst_file.close();

    try src_file.seekTo(100);
    try dst_file.seekTo(5);

    const len: usize = 25;
    const flags: u32 = 0;
    const result = os.linux.copy_file_range(src_file.handle, null, dst_file.handle, null, len, flags);
    const errno = os.linux.getErrno(result);

    assert(errno == 0);
    assert(result == 25);
    assert((try src_file.getPos()) == 100 + 25);
    assert((try dst_file.getPos()) == 5 + 25);
}

/// Create a test file containing bytes 0-255 and open for reading.
fn createInputFile() !File {
    {
        const src_file = try File.openWrite("/tmp/in_file");
        defer src_file.close();

        var buf: [256]u8 = []u8{0} ** 256;
        for (buf) |*b, i| b.* = @intCast(u8, i);
        try src_file.write(buf);
    }

    return File.openRead("/tmp/in_file");
}

Closes #2563.

std/os/linux.zig Outdated
@@ -259,6 +259,10 @@ pub fn pwrite(fd: i32, buf: [*]const u8, count: usize, offset: usize) usize {
return syscall4(SYS_pwrite, @bitCast(usize, isize(fd)), @ptrToInt(buf), count, offset);
}

pub fn copy_file_range(fd_in: i32, off_in: *allowzero isize, fd_out: i32, off_out: *allowzero isize, len: usize, flags: u32) usize {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why allowzero? are you looking for ?*isize to make it optional?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @daurnimator, that is exactly what I was looking for.

std/os/linux.zig Outdated
@@ -259,6 +259,10 @@ pub fn pwrite(fd: i32, buf: [*]const u8, count: usize, offset: usize) usize {
return syscall4(SYS_pwrite, @bitCast(usize, isize(fd)), @ptrToInt(buf), count, offset);
}

pub fn copy_file_range(fd_in: i32, off_in: ?*isize, fd_out: i32, off_out: ?*isize, len: usize, flags: u32) usize {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The use of isize for the offsets is wrong, the kernel expects a loff_t (a unsigned long).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note that unsigned long in linux-land is equivalent to usize

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note that unsigned long in linux-land is equivalent to usize

Have you ever heard of the so-called 32bit systems?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you mean like i386? I'm pretty sure sizeof(long) is 4 in that case, right? or is this logic incorrect:

return target_arch_pointer_bit_width(target->arch);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh well, I meant off_in and off_out and not isize heh. The kernel definition indeed suggests that the size is a size_t but the offsets are loff_t (defined here) and are unsigned long long. Again, I've misquoted the source code.

Sorry about that, I shouldn't write anything before my morning coffee.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is a off_t, so it is always 64-bit (except with legacy syscalls).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the kernel definition is unsigned long long then off_in and off_out should be *u64.

@LemonBoy
Copy link
Contributor

A test is needed (in os/linux/test.zig), make sure you also fail gracefully (by returning error.SkipZigTest) if ENOSYS is returned since this syscall is quite new.

It should also be used in copyFile (and copyFileMode, ugh, the more the merrier isn't it?), that's why adding this syscall was suggested in the first place.

Copy link
Contributor

@shawnl shawnl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have a 32-bit tier-one/tier-two architecture (we need one, for precisely this reason) so you can't really test this, but this patch is broken on 32-bit Linux, because the 64-bit arguments mean that this puts arguments in the wrong registers on 32-bit arches.

Also, it would be nice if you updated os/file.zig's copyFile to use this syscall (on Linux).

An ideal test for this would probably use memfd_create(), but alas, we don't have that syscall either..

std/os/linux.zig Outdated
@@ -259,6 +259,10 @@ pub fn pwrite(fd: i32, buf: [*]const u8, count: usize, offset: usize) usize {
return syscall4(SYS_pwrite, @bitCast(usize, isize(fd)), @ptrToInt(buf), count, offset);
}

pub fn copy_file_range(fd_in: i32, off_in: ?*isize, fd_out: i32, off_out: ?*isize, len: usize, flags: u32) usize {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is a off_t, so it is always 64-bit (except with legacy syscalls).

@LemonBoy
Copy link
Contributor

but this patch is broken on 32-bit Linux, because the 64-bit arguments mean that this puts arguments in the wrong registers on 32-bit arches.

off_in and off_out are pointers.

An ideal test for this would probably use memfd_create(), but alas, we don't have that syscall either..

memfd_create appeared quite late, in 3.17. Using plain old files is fine.

@ghost
Copy link
Author

ghost commented Jun 2, 2019

Thank you everyone for your feedback and help on this. I'm going to try to understand the signed/unsigned and 32-bit/64-bit situation better and prepare a more complete patch with tests and standard library integration.

@shawnl
Copy link
Contributor

shawnl commented Jun 2, 2019

So I found out that sometimes copy_file_range fails because the filesystem does not support it, so you always need a backup code path. Maybe the syscall itsself should have compat code...that is how glibc does it.

@ghost
Copy link
Author

ghost commented Jun 8, 2019

I’m closing this pull request because this system call had more implications than I had expected and I can’t commit to working through them all at the moment.

I have updated the function signature to use *u64 for off_in and off_out.

Here is a summary of what is needed for the next person who wants to attempt working on this:

  • Standard library integration: Update the copyFile and copyFileMode functions in std/fs.zig to use this system call, but only on Linux and only if the system call is available.

  • Testing: Figure out which aspect of this system call could meaningfully be tested in a unit test. Is it even possible to test that the types of off_in and off_out etc. are correct, or would that require impractically large files to test the edge cases of a u64 offset? Make sure to handle temporary files correctly or else implement the memfd_create system call too.

  • Handling ENOSYS: If the system call is not available, where should this error be handled? In the system call itself or at the places that use this system call? Check whether Zig is moving away having logic contained in system call wrappers to decide.

  • Fallback code: Consider adding fallback code to implement this system call in userland if the version of Linux is too old to support it. Here is glibc’s fallback implementation for reference.

@ghost ghost closed this Jun 8, 2019
@shawnl
Copy link
Contributor

shawnl commented Jun 8, 2019

Thank you for your clear write-up.

This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

copy_file_range syscall support on Linux
5 participants