Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use posix_spawn on macOS 12 #47

Draft
wants to merge 8 commits into
base: master
Choose a base branch
from
Draft

Conversation

tov
Copy link
Contributor

@tov tov commented Jul 26, 2022

vfork is deprecated on macOS 12. This PR uses posix_spawn on macOS 12 instead. (The standard posix_spawn does insufficient error reporting for our API, but the Mac posix_spawn's errors are more informative.)

tov added 2 commits July 26, 2022 12:14
The default macos-latest image is/was macOS 11, but vfork is deprecated in
macOS 12. So we specify testing on both macOSes 11 and 12.

Signed-off-by: Jesse Tov <[email protected]>
Mac OS 12 deprecates vfork and provides a richer posix_spawn than standard. So,
if we're on mac OS >= 12 and posix_spawn is available, we use that.

Signed-off-by: Jesse Tov <[email protected]>
@tov tov requested a review from aalekseyev July 26, 2022 17:01

#if defined(USE_POSIX_SPAWN)

CAMLprim value spawn_unix(value v_env,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if it makes sense in this case, especially since code has already been written, but:
we generally try to put logic on the OCaml side if we can.
In unix_spawn we couldn't do that (while staying sane) since forking interacts poorly with OCaml runtime.
I'm thinking posix_spawn doesn't share that problem, so we could move much of this code to OCaml, instead creating C bindings for the individual posix_spawnattr_* functions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. Yeah, I see how that could be good. On the other hand, I could extend the blocking section around posix_spawn to include all the related posix_spawn* calls, if you think it's helpful to release the runtime system for longer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extending the blocking section is also a liability because you can't access the OCaml heap (or raise exceptions) from the blocking section. For that reason I'm not exactly excited about extending it. (although if some of those functions are known to be blocking/slow then that may be worth it)

src/spawn_stubs.c Outdated Show resolved Hide resolved
src/spawn_stubs.c Outdated Show resolved Hide resolved
src/spawn_stubs.c Outdated Show resolved Hide resolved
}

for (int fd = 0; fd < 3; fd++) {
int tmp_fd = tmp_fds[fd] = safe_dup(info.std_fds[fd]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're dupping fds in the parent now, which I think resets close-on-exec, which means that any processes spawned concurrently (through our library or not) will inherit these fds, which seems sad.

I can't think of any way to do it race-free (dup3 doesn't seem to help because it requires us to know a free fd), but at least maybe we should set close-on-exec after dup?
(reading the doc of posix_spawn, it seems it should be able to deal with close-on-exec fds?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Crazy idea how to do it race-free (sans error handling):

int dup_close_on_exec(int fd) {
  int new_fd = open("/", O_CLOEXEC);
  dup3(fd, new_fd, O_CLOEXEC);
  return new_fd;
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm. Not that crazy. What do you make of this?:

/* Same as [dup] but 1) ensures the result isn't a standard FD, 2) applies
   [flags] (via dup3) to the new FD, and 3) on error, if [culprit] is non-null,
   assigns to [*culprit] a string literal naming the function that failed.
*/
static int safe_dup(int fd, int flags, char const **culprit)
{
  int new_fd = flags? open("/", O_CLOEXEC) : dup(fd);

  if (new_fd < 0) {
    if (culprit) {
      *culprit = flags? "open" : "dup";
    }
  }

  else if (new_fd < 3) {
    int next = safe_dup(fd, flags, culprit);
    close(new_fd);
    new_fd = next;
  }

  else if (flags) {
    new_fd = dup3(fd, new_fd, flags);
    if (new_fd < 0 && culprit) {
      *culprit = "dup3";
    }
  }

  return new_fd;
}

This might be easier in OCaml, too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this function benefitted from solving every problem at the same time. I would probably still keep at least the retry loop in a separate function. Anyway, I found that F_DUPFD_CLOEXEC is a thing (a flag you can pass to fcntl). Maybe that just works, and we don't need the ugly open/dup3 trick?

src/spawn_stubs.c Outdated Show resolved Hide resolved
Copy link
Contributor

@aalekseyev aalekseyev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall (other than inline comments):

  • I think this cpp file is growing a bit too large. If you can think of a way to split it that'd be great.
  • I don't understand your comment about error handling: how can posix_spawn possibly return good/better error messages if the only thing it returns is an int?
  • I'm curious about performance. Have you tried measuring it against a fork-based version?

@aalekseyev
Copy link
Contributor

By the way, does this function just work in linux? (at least when reviewing it I saw nothing that isn't documented in linux man pages)

jtov-js added 5 commits July 27, 2022 11:27
Some error cases that were raising an out-of-memory error now raise a specific
unix error instead.

Signed-off-by: Jesse Tov <[email protected]>
The unix error should name the actual function that failed, not the function
that performs that same job in the [v]fork-based implementation.

Signed-off-by: Jesse Tov <[email protected]>
When fchdir fails, there's no arg in the unix_error,
since fchdir takes a file descriptor, but the arg field
is for holding a string. So don't put that there.

Signed-off-by: Jesse Tov <[email protected]>
We were traversing the sigprocmask argument too late, after entering the
blocking section. Now we traverse the argument earlier, when considering the
other arguments, and only do the actual sigmask operations in the blocking
section.

This costs an extra system call (to pthread_sigmask) in the non-posix_spawn
case, because previously the code was using a single pthread_sigmask call to
both get the current sigmask and block all signals. Now we use it twice: before
the blocking section to get the current mask, and then inside the blocking
section to block all signals.

Signed-off-by: Jesse Tov <[email protected]>
@tov
Copy link
Contributor Author

tov commented Jul 27, 2022

  • I think this cpp file is growing a bit too large. If you can think of a way to split it that'd be great.

Yeah, I can do that.

  • I don't understand your comment about error handling: how can posix_spawn possibly return good/better error messages if the only thing it returns is an int?

Linux's posix_spawn sets errno to EINVAL for all errors, whereas mac OS posix_spawn can also set errno to E2BIG, EACCES, EFAULT, etc., depending on what goes wrong.

  • I'm curious about performance. Have you tried measuring it against a fork-based version?

Not yet, no.

By the way, does this function just work in linux? (at least when reviewing it I saw nothing that isn't documented in linux man pages)

Linux posix_spawn doesn't have posix_spawn_file_actions_addchdir_np/posix_spawn_file_actions_addfchdir_np.

@tov
Copy link
Contributor Author

tov commented Jul 27, 2022

I'm also wondering about user-visible API questions. The current code in this PR doesn't respect the unix_backend argument when posix_spawn is enabled. Presumably, client code should be able to choose between the fork, vfork, and posix_spawn backends where those are supported. But on Linux, whether posix_spawn is supported depends on whether a non-inherited cwd argument is given to Spawn.spawn. So, we could:

  1. Add Posix_spawn as a third Unix backend, and default to it on newer mac OS when SPAWN_USE_FORK isn't set.
  2. Continue ignoring the unix_backend argument on Windows.
  3. When Posix_spawn is passed as the backend on Linux, and cwd is not Inherit, either:
    • ignore the requested backend and fall back to vfork (consistent with Windows behavior), or
    • return ENOSYS.

I was trying to keep this API as simple as a conservative extension as possible, because without knowing about use cases, it's hard to think about how else it might be used.

@tov
Copy link
Contributor Author

tov commented Jul 27, 2022

It looks like posix_spawn is kinda broken on Linux (my uid box), in that it 1) isn't checking whether the program exists before forking, and 2) doesn't seem to be setting the child's sigmask. So posix_spawn may not be suitable on (some) Linux for implementing this API.

@tov tov marked this pull request as draft July 27, 2022 21:42
This doesn't work yet.

Signed-off-by: Jesse Tov <[email protected]>
@aalekseyev
Copy link
Contributor

Oh well, it's unfortunate that it doesn't work well on Linux. I think we just shouldn't enable posix_spawn on linux at all, then. (if we add a new backend, it should probably either always fail on Linux or fallback to vfork (either is acceptable, I think))

@tov
Copy link
Contributor Author

tov commented Oct 11, 2022 via email

@rgrinberg
Copy link
Collaborator

It looks like posix_spawn is kinda broken on Linux (my uid box), in that it 1) isn't checking whether the program exists before forking, and 2) doesn't seem to be setting the child's sigmask. So posix_spawn may not be suitable on (some) Linux for implementing this API.

It's also broken on macos :)

rust-lang/rust#80537

@rgrinberg
Copy link
Collaborator

On mac, there seems to be a workaround for the bug around doing chdir with the posix_spawn api libuv/libuv#3257

We could use pthread_chdir_np for a thread local chdir before spawning

emillon pushed a commit to ocaml-dune/spawn that referenced this pull request Apr 16, 2024
(picked from janestreet#47)

Signed-off-by: Jesse Tov <[email protected]>
Signed-off-by: Rudi Grinberg <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants