Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

child_recvmsg: check if the control message was not truncated before trying to extract the child fd #3870

Merged
merged 1 commit into from
Nov 8, 2024

Conversation

sidkshatriya
Copy link
Contributor

@sidkshatriya sidkshatriya commented Nov 5, 2024

In various places in the codebase, file descriptors are sent to the tracee. Those file descriptors are often used in remote syscalls like mmap. The file descriptors are received using the system call recvmsg. The fd is sent/received using SCM_RIGHTS functionality in the control message part of the message payload.

It is possible for the kernel to NOT send the file descriptor to the receiving process if the per process max number of open files limit has been currently reached.

What is interesting is that the recvmsg system call will be successful: the single byte sent as part of the "data plane" of the message is received successfully but for all purposes this is a failure because the fd in the control message is the real payload but it will not be present if the max open files limit has currently been reached.

In this situation, to check if there is a failure, check message msg_flags field. Do this by ASSERT()ing in AutoRemoteSyscalls::child_recvmsg() method if msg_flags has MSG_CTRUNC (control message has been truncated). Without this ASSERT() the fd obtained is just whatever junk that is held in the memory and it could result in EBADF later on on a remote syscall or if that fd by chance exists then it could result in similarly strange issues like a wrong file being remotely mmaped and so on...

In the Linux kernel to see how/why, MSG_CTRUNC could happen:

  1. See alloc_fd() in the Linux kernel

    https://github.com/torvalds/linux/blob/59b723cd2adbac2a34fc8e12c74ae26ae45bf230/fs/file.c#L506-L508
    Here -EMFILE will be returned if the number of open files limit is exceeded

  2. Now if (1) happened:

    invoke_syscall()
      __arm64_sys_recvmsg()
        __sys_recvmsg()
          ___sys_recvmsg()
            ____sys_recvmsg()
              sock_recvmsg()
                unix_stream_recvmsg()
                  unix_stream_read_generic()
                    scm_detach_fds() <--- -EMFILE -----------\
                      scm_recv_one_fd()                      |
                        receive_fd()                         |
                          get_unused_fd_flags()              |
                            __get_unused_fd_flags()          |
                              alloc_fd() --------------------/

In scm_detach_fds() https://github.com/torvalds/linux/blob/59b723cd2adbac2a34fc8e12c74ae26ae45bf230/net/core/scm.c#L338-L359
due to -EMFILE in the single fd (that is sought to be received), the fd will not be added to the control message.

In scm_detach_fds() https://github.com/torvalds/linux/blob/59b723cd2adbac2a34fc8e12c74ae26ae45bf230/net/core/scm.c#L361-L362
due to -EMFILE earlier, MSG_CTRUNC will be added msg_flags

@sidkshatriya
Copy link
Contributor Author

There is a single CI failure on x86-64 and no failures on arm64. It does not look related to this PR.

The following tests FAILED:
	1703 - exec_from_other_thread-32 (Failed)

@khuey
Copy link
Collaborator

khuey commented Nov 5, 2024

Did you find this by code inspection or did you actually hit it in the wild?

@sidkshatriya
Copy link
Contributor Author

sidkshatriya commented Nov 5, 2024

I hit this in the wild. I was able to debug this using ftrace_helper btw.

@sidkshatriya
Copy link
Contributor Author

sidkshatriya commented Nov 5, 2024

(my setup is somewhat custom but the bug and fix is valid for for normal rr. The situation is actually simple: what if the max fd limit is hit and at that point rr decides to send a fd across ? )

@KJTsanaktsidis
Copy link
Contributor

what if the max fd limit is hit and at that point rr decides to send a fd across ?

It was a while ago so I don’t have all the details, but I found that when running some Ruby test cases which deliberately exhaust the FD limit under rr, I had to disable syscallbuf, probably for exactly this reason.

…trying to extract the child fd

In various places in the codebase, file descriptors are sent to the
tracee. Those file descriptors are often used in remote syscalls
like mmap. The file descriptors are received using the system
call recvmsg. The fd is sent/received using SCM_RIGHTS functionality
in the control message part of the message payload.

It is possible for the kernel to NOT send the file descriptor to the
receiving process if the per process max number of open files limit has been
currently reached.

What is interesting is that the recvmsg system call will be successful:
the single byte sent as part of the "data plane" of the message is
received successfully but for all purposes this is a failure because
the fd in the control message is the real payload but it will *not* be present
if the max open files limit has currently been reached.

In this situation, to check if there is a failure, check message msg_flags field.
Do this by ASSERT()ing in child_recvmsg() method if msg_flags has MSG_CTRUNC
(control message has been truncated). Without this ASSERT() the fd
obtained is just whatever junk that is held in the memory and it could result
in EBADF later on on a remote syscall or if that fd by chance exists then it
could result in similarly strange issues like a wrong file being remotely mmaped
and so on...

In the Linux kernel to see how/why, MSG_CTRUNC could happen:
(1) See alloc_fd() in the Linux kernel
    https://github.com/torvalds/linux/blob/59b723cd2adbac2a34fc8e12c74ae26ae45bf230/fs/file.c#L506-L508
    Here -EMFILE will be returned if the number of open files limit is exceeded
(2) Now if (1) happened:
    ```
    invoke_syscall()
      __arm64_sys_recvmsg()
        __sys_recvmsg()
          ___sys_recvmsg()
            ____sys_recvmsg()
              sock_recvmsg()
                unix_stream_recvmsg()
                  unix_stream_read_generic()
                    scm_detach_fds() <--- -EMFILE -----------\
                      scm_recv_one_fd()                      |
                        receive_fd()                         |
                          get_unused_fd_flags()              |
                            __get_unused_fd_flags()          |
                              alloc_fd() --------------------/
    ```
    In scm_detach_fds() https://github.com/torvalds/linux/blob/59b723cd2adbac2a34fc8e12c74ae26ae45bf230/net/core/scm.c#L338-L359
    due to -EMFILE in the single fd (that is sought to be received), the fd will not be added to the control message.

    In scm_detach_fds() https://github.com/torvalds/linux/blob/59b723cd2adbac2a34fc8e12c74ae26ae45bf230/net/core/scm.c#L361-L362
    due to -EMFILE earlier, MSG_CTRUNC will be added msg_flags
@khuey
Copy link
Collaborator

khuey commented Nov 7, 2024

(my setup is somewhat custom but the bug and fix is valid for for normal rr. The situation is actually simple: what if the max fd limit is hit and at that point rr decides to send a fd across ? )

To be clear I'm not at all opposed to fixing this I just was curious how you found it.

@sidkshatriya
Copy link
Contributor Author

CI is all green after I force pushed a minor change (and this caused the CI to run again).

(The single failure on x86 last time must have been a transient thing)

@khuey
Copy link
Collaborator

khuey commented Nov 7, 2024

This seems fine to me but I think @rocallahan should look at it before merging it.

@rocallahan rocallahan merged commit 26c68fe into rr-debugger:master Nov 8, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants