Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

child_recvmsg: check if the control message was not truncated before trying to extract the child fd #3870

Merged
merged 1 commit into from
Nov 8, 2024

Commits on Nov 7, 2024

  1. child_recvmsg: check if the control message was not truncated before …

    …trying to extract the child fd
    
    In various places in the codebase, file descriptors are sent to the
    tracee. Those file descriptors are often used in remote syscalls
    like mmap. The file descriptors are received using the system
    call recvmsg. The fd is sent/received using SCM_RIGHTS functionality
    in the control message part of the message payload.
    
    It is possible for the kernel to NOT send the file descriptor to the
    receiving process if the per process max number of open files limit has been
    currently reached.
    
    What is interesting is that the recvmsg system call will be successful:
    the single byte sent as part of the "data plane" of the message is
    received successfully but for all purposes this is a failure because
    the fd in the control message is the real payload but it will *not* be present
    if the max open files limit has currently been reached.
    
    In this situation, to check if there is a failure, check message msg_flags field.
    Do this by ASSERT()ing in child_recvmsg() method if msg_flags has MSG_CTRUNC
    (control message has been truncated). Without this ASSERT() the fd
    obtained is just whatever junk that is held in the memory and it could result
    in EBADF later on on a remote syscall or if that fd by chance exists then it
    could result in similarly strange issues like a wrong file being remotely mmaped
    and so on...
    
    In the Linux kernel to see how/why, MSG_CTRUNC could happen:
    (1) See alloc_fd() in the Linux kernel
        https://github.com/torvalds/linux/blob/59b723cd2adbac2a34fc8e12c74ae26ae45bf230/fs/file.c#L506-L508
        Here -EMFILE will be returned if the number of open files limit is exceeded
    (2) Now if (1) happened:
        ```
        invoke_syscall()
          __arm64_sys_recvmsg()
            __sys_recvmsg()
              ___sys_recvmsg()
                ____sys_recvmsg()
                  sock_recvmsg()
                    unix_stream_recvmsg()
                      unix_stream_read_generic()
                        scm_detach_fds() <--- -EMFILE -----------\
                          scm_recv_one_fd()                      |
                            receive_fd()                         |
                              get_unused_fd_flags()              |
                                __get_unused_fd_flags()          |
                                  alloc_fd() --------------------/
        ```
        In scm_detach_fds() https://github.com/torvalds/linux/blob/59b723cd2adbac2a34fc8e12c74ae26ae45bf230/net/core/scm.c#L338-L359
        due to -EMFILE in the single fd (that is sought to be received), the fd will not be added to the control message.
    
        In scm_detach_fds() https://github.com/torvalds/linux/blob/59b723cd2adbac2a34fc8e12c74ae26ae45bf230/net/core/scm.c#L361-L362
        due to -EMFILE earlier, MSG_CTRUNC will be added msg_flags
    sidkshatriya committed Nov 7, 2024
    Configuration menu
    Copy the full SHA
    c021aa9 View commit details
    Browse the repository at this point in the history