Skip to content

Commit

Permalink
child_recvmsg: check if the control message was not truncated before …
Browse files Browse the repository at this point in the history
…trying to extract the child fd

In various places in the codebase, file descriptors are sent to the
tracee. Those file descriptors are often used in remote syscalls
like mmap. The file descriptors are received using the system
call recvmsg. The fd is sent/received using SCM_RIGHTS functionality
in the control message part of the message payload.

It is possible for the kernel to NOT send the file descriptor to the
receiving process if the per process max number of open files limit has been
currently reached.

What is interesting is that the recvmsg system call will be successful:
the single byte sent as part of the "data plane" of the message is
received successfully but for all purposes this is a failure because
the fd in the control message is the real payload but it will *not* be present
if the max open files limit has currently been reached.

In this situation, to check if there is a failure, check message msg_flags field.
Do this by ASSERT()ing in child_recvmsg() method if msg_flags has MSG_CTRUNC
(control message has been truncated). Without this ASSERT() the fd
obtained is just whatever junk that is held in the memory and it could result
in EBADF later on on a remote syscall or if that fd by chance exists then it
could result in similarly strange issues like a wrong file being remotely mmaped
and so on...

In the Linux kernel to see how/why, MSG_CTRUNC could happen:
(1) See alloc_fd() in the Linux kernel
    https://github.com/torvalds/linux/blob/59b723cd2adbac2a34fc8e12c74ae26ae45bf230/fs/file.c#L506-L508
    Here -EMFILE will be returned if the number of open files limit is exceeded
(2) Now if (1) happened:
    ```
    invoke_syscall()
      __arm64_sys_recvmsg()
        __sys_recvmsg()
          ___sys_recvmsg()
            ____sys_recvmsg()
              sock_recvmsg()
                unix_stream_recvmsg()
                  unix_stream_read_generic()
                    scm_detach_fds() <--- -EMFILE -----------\
                      scm_recv_one_fd()                      |
                        receive_fd()                         |
                          get_unused_fd_flags()              |
                            __get_unused_fd_flags()          |
                              alloc_fd() --------------------/
    ```
    In scm_detach_fds() https://github.com/torvalds/linux/blob/59b723cd2adbac2a34fc8e12c74ae26ae45bf230/net/core/scm.c#L338-L359
    due to -EMFILE in the single fd (that is sought to be received), the fd will not be added to the control message.

    In scm_detach_fds() https://github.com/torvalds/linux/blob/59b723cd2adbac2a34fc8e12c74ae26ae45bf230/net/core/scm.c#L361-L362
    due to -EMFILE earlier, MSG_CTRUNC will be added msg_flags
  • Loading branch information
sidkshatriya authored and rocallahan committed Nov 8, 2024
1 parent 8b78784 commit 26c68fe
Showing 1 changed file with 25 additions and 0 deletions.
25 changes: 25 additions & 0 deletions src/AutoRemoteSyscalls.cc
Original file line number Diff line number Diff line change
Expand Up @@ -619,6 +619,31 @@ static long child_recvmsg(AutoRemoteSyscalls& remote, int child_sock) {
LOG(debug) << "Failed to recvmsg " << ret;
return ret;
}

typename Arch::msghdr msghdr =
remote.task()->read_mem(msg.remote_msg(), &ok);
if (!ok) {
ASSERT(remote.task(), errno == ESRCH || errno == EIO);
LOG(debug) << "Failed to read msghdr";
return -ESRCH;
}
ASSERT(remote.task(), !(msghdr.msg_flags & MSG_CTRUNC))
<< "Control message was truncated; error in receiving fd in "
"AutoRemoteSyscalls::child_recvmsg(). msghdr.msg_flags: "
<< HEX(msghdr.msg_flags) << "\n"
<< "This error has been most likely caused by a process\n"
<< "exceeding the max allowed open files limit set by\n"
<< "Linux. Please consult `man 1 ulimit' and `man 1 prlimit' to\n"
<< "learn how the max open files limit may be changed/checked.\n"
<< "As usual, always carefully think through all implications of\n"
<< "changing the process limits on your programs before making any\n"
<< "changes.\n\n"
<< "If the above Assertion still fails, then (a) The limit you set was\n"
<< "not high enough, or (b) the program could be opening files in an\n"
<< "unbounded fashion, or (c) there is some other reason why socket\n"
<< "control messages are being truncated and file descriptors cannot be\n"
<< "received via SCM_RIGHTS.";

int their_fd = remote.task()->read_mem(msg.remote_cmsgdata(), &ok);
if (!ok) {
ASSERT(remote.task(), errno == ESRCH || errno == EIO);
Expand Down

0 comments on commit 26c68fe

Please sign in to comment.