Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix --preserve-fds, eliminate stray FD being passed into container #2893

Merged
merged 6 commits into from
Dec 30, 2024

Conversation

aidanhs
Copy link
Contributor

@aidanhs aidanhs commented Aug 24, 2024

#2663 seems to have broken preserve-fds by unconditionally closing everything aside from stdio in the container main process.

I've fixed this by effectively reverting part of that PR. Typically I'd expect it to be a bad idea to revert security fixes, but unfortunately I don't really understand much of the diagnosis at #2663 and can't get it to stack up against my observations of previous and current behavior of youki.

Specifically:

youki does leak a directory file descriptor from the host mount namespace, but it just so happens that the directory is the rootfs of the container

I checked out 04f8f2d (the commit before the fix PR was merged) and couldn't reproduce this. With a default config generated by youki spec and a command of "ls", "-al", "/proc/self/fd" I get the following output:

dr-x------    2 root     root             0 Aug 24 20:17 .
dr-xr-xr-x    9 root     root             0 Aug 24 20:17 ..
lrwx------    1 root     root            64 Aug 24 20:17 0 -> /dev/pts/9
lrwx------    1 root     root            64 Aug 24 20:17 1 -> /dev/pts/9
lrwx------    1 root     root            64 Aug 24 20:17 2 -> /dev/pts/9
ls: /proc/self/fd/3: cannot read link: No such file or directory
lr-x------    1 root     root            64 Aug 24 20:17 3

This is what I'd expect - preserve fds (#177) already closes all the FDs in the init process before execution. FD 3 here is ls opening the directory.

I can reproduce a leak by passing --preserve-fds 10 (I get something at FD 7, which seems to be the rootfs being referenced) - but the fix solves this by just making the preserve-fds flag useless. I fix this by actually closing the culprit directory, and then additionally verifying by hand that passing --preserve-fds 100 does not list any other stray FDs that are being passed down.

In addition, no care is taken to make sure all non-stdio files are O_CLOEXEC

Again I don't understand this - the preserve-fds PR does exactly that by marking everything as O_CLOEXEC before process execution? It is not ideal that passing --preserve-fds leaks something, but I've fixed that in this PR in a more precise way.

In addition, any file descriptors passed to youki are not closed until the container process is executed, meaning that easily-overlooked programming errors by users of youki can lead to these attacks becoming exploitable.

I also don't understand this - the preserve fds PR sets the O_CLOEXEC flag in the init process, before execing the user process. It seems better to do this as late as possible, so fewer file descriptors can slip in? The vulnerability fix puts it in the main process, giving ample opportunity for other FDs to be opened and leaked.

Perhaps there's more detail that would help my confusion in the review in the private repository that would help? Per:

This PR has already been approved by jprendes rumpl yihuaf in private repository.

Edit: oops, sorry for the mentions, those were unintentional

@aidanhs aidanhs force-pushed the aphs-fix-preserve-fd branch from 00bfd41 to 97574b1 Compare August 24, 2024 21:01
@utam0k utam0k requested review from utam0k and a team August 28, 2024 13:15
@YJDoc2
Copy link
Collaborator

YJDoc2 commented Aug 28, 2024

@aidanhs , failing oci CI -

Running default/default.t
failed to create the container
ERROR libcontainer::process::container_init_process: failed to cleanup extra fds err=Nix(EPERM)
ERROR libcontainer::process::container_intermediate_process: failed to initialize container process: failed syscall
ERROR libcontainer::process::container_main_process: failed to wait for init ready: exec process failed with error error in executing process : failed syscall
ERROR libcontainer::container::builder_impl: failed to run container process exec process failed with error error in executing process : failed syscall
ERROR youki: error in executing command: exec process failed with error error in executing process : failed syscall

Stack backtrace:
   0: <unknown>
   1: <unknown>
   2: <unknown>
   3: <unknown>
   4: <unknown>
error in executing command: exec process failed with error error in executing process : failed syscall

Stack backtrace:
   0: <unknown>
   1: <unknown>
   2: <unknown>
   3: <unknown>
   4: <unknown>
Error: exec process failed with error error in executing process : failed syscall

Stack backtrace:
   0: <unknown>
   1: <unknown>
   2: <unknown>
   3: <unknown>
   4: <unknown>
exit status 1
error: Recipe `test-oci` failed on line 54 with exit code 1

@aidanhs aidanhs force-pushed the aphs-fix-preserve-fd branch from 97574b1 to c92c2e0 Compare August 30, 2024 18:30
@aidanhs
Copy link
Contributor Author

aidanhs commented Aug 30, 2024

The failure was a problem with some changes that I made to try and make closing FDs happen later - unfortunately seccomp got in the way. I've now backed out those changes as strictly speaking they were unnecessary.

Have run the tests locally and they seem to pass.

Copy link
Member

@utam0k utam0k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your PR! Probably you are right, but I'm not sure. So, may I ask you to write an e2e test to clarify?
https://containers.github.io/youki/developer/e2e/rust_oci_test.html

Also, I've updated the original PR. Thanks for your advice. It makes sense a lot.
#2663

@aidanhs aidanhs force-pushed the aphs-fix-preserve-fd branch from c92c2e0 to fc5c131 Compare September 2, 2024 01:27
@aidanhs
Copy link
Contributor Author

aidanhs commented Sep 2, 2024

There are now three integration tests:

  1. that youki itself is not leaking FDs (i.e. guarding against a hypothetical future vulnerability)
  2. that preserve-fds actually works
  3. that open FDs are not passed down without preserve-fds

This was harder than I expected because:

  1. I needed to extend the test harness to permit customisation of arguments to container create (I did a small amount of tidying here where another test wanted to do this too, but I didn't want to get too distracted)
  2. because this testing is sensitive to open FDs, it cannot be run in parallel with other tests - I have had to extend the test harness to support non-parallel execution
  3. the --preserve-fds flag does not get passed via the spec so I needed to communicate state via the filesystem

@aidanhs aidanhs force-pushed the aphs-fix-preserve-fd branch from 3b83a72 to 20d9be4 Compare September 3, 2024 15:39
}

fn open_devnull_no_cloexec() -> Result<(fs::File, RawFd)> {
// Rust std by default sets cloexec, so we undo it
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Comment on lines +12 to +14
/// can the test group be executed in parallel (both the tests
/// within it, and alongside other test groups)
parallel: bool,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@YJDoc2 WDYT? I prefer to introduce this flag.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@YJDoc2 Do you want to have additional time to check it?

@@ -545,3 +546,47 @@ pub fn test_io_priority_class(spec: &Spec, io_priority_class: IOPriorityClass) {
eprintln!("error ioprio_get expected priority {expected_priority:?}, got {priority}")
}
}

pub fn validate_fd_control(_spec: &Spec) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯

Copy link
Member

@utam0k utam0k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aidanhs
Sorry for the late review but your PR is 💯 for me and I've learned a lot of things from your PR.

I'm waiting to comment on one change from @YJDoc2 .

@utam0k
Copy link
Member

utam0k commented Dec 4, 2024

@aidanhs May I ask you to resolve the conflicts? Others look good to me.

@YJDoc2
Copy link
Collaborator

YJDoc2 commented Dec 5, 2024

Hey, I'll need some time to check the changes done in test functions. probably till this weekend.
Also a recent merged PR extends the test_inside_container function to allow passing extra args, I think @aidanhs you should rebase/merge main resolve conflicts and use the updated function to pass the extra args.

Signed-off-by: Aidan Hobson Sayers <[email protected]>
Test harness additionally needed to support

1. tests that cannot run in parallel
2. tests that need to customise create arguments

Signed-off-by: Aidan Hobson Sayers <[email protected]>
@aidanhs aidanhs force-pushed the aphs-fix-preserve-fd branch from 20d9be4 to 32d5dde Compare December 7, 2024 14:56
@aidanhs
Copy link
Contributor Author

aidanhs commented Dec 7, 2024

Have rebased and now use the new CreateOptions struct.

Copy link
Collaborator

@YJDoc2 YJDoc2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM from test side.

@YJDoc2
Copy link
Collaborator

YJDoc2 commented Dec 9, 2024

Hey just a couple of comments here :

  1. @aidanhs , thanks for following up on this! There was a minor conflict, which I have fixed and pushed, please check once that my last two commits are fine.
  2. @utam0k I feel this is ok to merge, but given that this is related to security code we did a while back (even though I agree with the argument in the description) if you want to do a more careful scrutiny of this, we can merge this after we release 0.5.0. Right now I have done a basic check for the logic change, and checked the tests.
  3. Finally for myself - we really need to refactor the test running logic here. That is definitely out of scope for this PR, but we need to clean things up, probably remove the crossbeam completely now that rust natively supports scoped threads. Also there are a lot of extra println which clutter the logs, a couple of tests should be conditional test and are not, and a couple of tests are "leaking" resources on host system which causes issues when re-running the test. I need to do a better job at reviewing and fix these issues up.

@aidanhs
Copy link
Contributor Author

aidanhs commented Dec 20, 2024

The two commits look fine.

Personally I'd have left in the TODO at f302ea0#diff-d45900789a82e7875774e0c4bdae819850026ea66be9d25fa96a4b5d736ecdf7L21, modifying it to say "the CreateOptions argument could be used to pass the pidfile" - I added that comment because I saw the test could be cleaned up to use existing test functions, but I didn't want to spend effort on it for this PR so a note for future visitors.

But I don't mind either way!

@utam0k
Copy link
Member

utam0k commented Dec 24, 2024

I feel this is ok to merge, but given that this is related to security code we did a while back (even though I agree with the argument in the description) if you want to do a more careful scrutiny of this, we can merge this after we release 0.5.0. Right now I have done a basic check for the logic change, and checked the tests.

@yihuaf Could you mind double-checking this PR for sure?

Copy link
Collaborator

@yihuaf yihuaf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch and good job on the detailed tests.

@@ -279,6 +283,11 @@ impl Syscall for LinuxSyscall {
errno
})?;

close(newroot).map_err(|errno| {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this explicit close required? Does the O_CLOEXEC take care of closing after the function returns, or there is a specific reason the fd will not be closed unless we explicitly call close?

// to ensure we don't leak any file descriptors to the intermediate process.
// Please refer to https://github.com/opencontainers/runc/security/advisories/GHSA-xr7r-f8xq-vfvv for more details.
let syscall = container_args.syscall.create_syscall();
syscall.close_range(0).map_err(|err| {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this revert is correct. Reading the security advisory, the action called out in the advisory is to set O_CLOEXEC on all FDs other than stdio. The close_range in the --preserve-fds check in the init process should take care of all this case.

@yihuaf
Copy link
Collaborator

yihuaf commented Dec 30, 2024

I also don't understand this - the preserve fds PR sets the O_CLOEXEC flag in the init process, before execing the user process. It seems better to do this as late as possible, so fewer file descriptors can slip in?

You are correct. The preserve-fds logic should be execute as late as possible, because we want to cover all the cases where fds can be opened during all the previous setup action.

Further, your analysis is correct. The preserve-fds logic works correctly before the fix. Leaking the newroot fds when pivot_root is valid and this PR fixed it. If a caller pass in preserve-fds that allows the problematic fds to be leaked, I don't think we need to guard against the situation (and am not sure if it is possible).

@utam0k
Copy link
Member

utam0k commented Dec 30, 2024

Thanks all!

@utam0k utam0k merged commit 83d87a2 into youki-dev:main Dec 30, 2024
27 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants