Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nf claims to have delivered SIGINT to all children on exit from one, but does not actually #176

Open
benweint opened this issue Dec 1, 2023 · 2 comments

Comments

@benweint
Copy link

benweint commented Dec 1, 2023

The README says:

If your processes exit, Node Foreman will assume an error has occurred and shut your application down.

nf does seem to detect the exit of a single child process, and claims to be sending a SIGINT to all children in response to it, but in fact will not deliver the SIGINT in all cases.

Here's a simple repro case:

❯ cat wait-for-sigint.sh 
#!/bin/bash

function handle_sigint {
	echo "got SIGINT, exiting ..."
	exit
}

trap handle_sigint SIGINT

echo "started, sleeping forever awaiting SIGINT"
sleep 1000000

❯ cat Procfile  
a: sleep 10 && exit 1
b: ./wait-for-sigint.sh

❯ nf start
12:38:47 PM b.1 |  started, sleeping forever awaiting SIGINT
[DONE] Killing all processes with signal  SIGINT
12:38:57 PM a.1 Exited with exit code null

< ... `nf` does not actually exit here, nor doe the `b` child process running `wait-for-sigint.sh` ... >

Observations

If I modify wait-for-sigint.sh to emit a constant stream of output while it is waiting, then the test case works as expected:

❯ cat Procfile 
a: sleep 5 && exit 1
b: ./wait-for-sigint-with-output.sh

❯ cat wait-for-sigint-with-output.sh 
#!/bin/bash

function handle_sigint {
	echo "got SIGINT, exiting ..."
	exit
}

trap handle_sigint SIGINT

echo "started, sleeping forever awaiting SIGINT"

while true
do
  echo 'still here'
  sleep 1
done

❯ nf start                               
12:43:41 PM b.1 |  started, sleeping forever awaiting SIGINT
12:43:41 PM b.1 |  still here
12:43:42 PM b.1 |  still here
12:43:43 PM b.1 |  still here
12:43:44 PM b.1 |  still here
12:43:45 PM b.1 |  still here
[DONE] Killing all processes with signal  SIGINT
12:43:45 PM a.1 Exited with exit code null
12:43:46 PM b.1 |  got SIGINT, exiting ...
12:43:46 PM b.1 Exited Successfully

Comparison to other implementations

foreman (Ruby)

❯ foreman start
12:47:50 a.1    | started with pid 59856
12:47:50 b.1    | started with pid 59857
12:47:50 b.1    | started, sleeping forever awaiting SIGINT
12:47:55 a.1    | exited with code 1
12:47:55 system | sending SIGTERM to all processes
12:47:56 b.1    | terminated by SIGTERM

goreman (Go)

goreman has different default behavior wrt a single child process exiting:

❯ goreman start
12:45:24 a | Starting a on port 5000
12:45:24 b | Starting b on port 5100
12:45:24 b | started, sleeping forever awaiting SIGINT
12:45:29 a | Terminating a

... but with -exit-on-error ('Exit goreman if a subprocess quits with a nonzero return code'):

❯ goreman -exit-on-error start
12:46:10 a | Starting a on port 5000
12:46:10 b | Starting b on port 5100
12:46:10 b | started, sleeping forever awaiting SIGINT
12:46:15 a | Terminating a
12:46:15 b | got SIGINT, exiting ...
12:46:15 b | Terminating b
goreman: exit status 1
benweint added a commit to benweint/node-foreman that referenced this issue Dec 4, 2023
@benweint
Copy link
Author

benweint commented Dec 4, 2023

Turns out I had misdiagnosed this!

nf really was delivering SIGINT to all direct children, but because it doesn't use process groups for each spawned child, if the child processes spawned their own children and didn't respond to SIGINT by exiting or forwarding to their children, then nf would just hang when one child exited.

In the repro case that I gave, the process tree looks like this after a exits:

❯ pstree -s nf.js
... snip ...
     \-+= 83622 ben node nf.js start
       \-+- 83628 ben /bin/bash ./wait-for-sigint.sh
         \--- 83632 ben sleep 1000000

The bash process (pid 83628) actually has received the SIGINT, but per the bash manual:

When Bash receives a signal for which a trap has been set while waiting for a command to complete, the trap will not be executed until the command completes.

So in this example:

  1. a exited
  2. nf sent SIGINT to the direct child process for b (bash, pid=83628)
  3. bash got the SIGINT, but was waiting to invoke the trap handler until the sleep command (pid=83632) exited
  4. The sleep command itself never received the SIGINT

The way that goreman solves this is by creating a process group for each spawned child, and then delivering the SIGINT signals to the group, rather than the direct child.

@benweint
Copy link
Author

benweint commented Dec 4, 2023

I've implemented support for using process groups in my fork (benweint@5cb9ee5) and can PR it if there's interest, but it looks like this project might be dead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant