Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLI processes started in distrobox/toolbox are not terminated correctly #533

Open
1player opened this issue Feb 10, 2024 · 10 comments
Open
Labels
bug Something isn't working f39 Related to Fedora 39 f40 Related to Fedora 40 kinoite Also affect Fedora Kinoite upstream Issue reported, fixed or related to upstream projects

Comments

@1player
Copy link

1player commented Feb 10, 2024

I am aware that this is not a Silverblue-only issue, but it affects all users due to the heavy use of toolbox and the likes, and I'm at my wits ends to track this issue down, after opening bug reports on podman, distrobox, toolbox to no real concrete answer.

Steps to reproduce:

  • Enter a distrobox or toolbox container in a terminal emulator
  • Run a command (i.e. top)
  • Close the terminal emulator

Expected result

The command to be terminated after the window is closed.

Actual result

The process stays running in the background.

Discussion

There seems to be an issue with how podman, distrobox or toolbox are handling SIGHUP signals which should cause the shell and background processes to be terminated, but does not in practice. The result is any command left running in a toolbox keeps running until the system is rebooted.

On my Silverblue machine, in 48 hours of uptime I have accumulated 38 zsh processes that keep running in the background because they are not killed appropriately:

toolbox:~ % ps aux | grep zsh | wc -l 
38
toolbox:~ % ps aux | grep zsh | head  
sph         5447  0.0  0.0  11620  7836 pts/0    Ss+  Feb08   0:00 /usr/bin/zsh -l
sph        17014  0.0  0.0  11288  7464 pts/1    Ss+  Feb08   0:00 /usr/bin/zsh -l
sph        18317  0.0  0.0  11288  7640 pts/2    Ss+  Feb08   0:00 /usr/bin/zsh -l
sph        18711  0.0  0.0  11120  7316 pts/3    Ss+  Feb08   0:00 /usr/bin/zsh -l
sph        19027  0.0  0.0  11120  7396 pts/4    Ss+  Feb08   0:00 /usr/bin/zsh -l
sph        19668  0.0  0.0  11120  7316 pts/5    Ss   Feb08   0:00 /usr/bin/zsh -l
sph        19716  0.0  0.0 227848  8212 pts/6    Ss+  Feb08   0:00 /usr/bin/zsh
sph        25591  0.0  0.0  11468  7644 pts/7    Ss+  Feb08   0:00 /usr/bin/zsh -l
sph        27631  0.0  0.0  11428  7636 pts/8    Ss+  Feb08   0:00 /usr/bin/zsh -l
sph        28415  0.0  0.0  11292  7492 pts/9    Ss+  Feb08   0:00 /usr/bin/zsh -l

All these processes are connected to dead pseudo-terminals. This happens because I have configured my terminal emulator (Prompt) to automatically enter inside the container, which triggers this issue every single time I open a window. I expect everything to terminate when I close the terminal, but it does not happen.

The only workaround is to reboot the machine often to clean up what is effectively a memory leak.

As I said, this is not the appropriate place for this issue perhaps, but I would like to understand where even to start to track down this issue, and if it happens only in my machine -- because I can reproduce 100% of the time, whatever the shell, terminal emulator or DE. Given that Fedora Atomic distro workflows are centred around using containers for most of the work, having processes that are not terminated correctly is a major bug and an unnecessary leak of resources.

When I mentioned this behaviour on the podman tracker, I've been told it's not a podman issue, but a toolbox and distrobox one. toolbox and distrobox know it's an issue but there doesn't seem to be any movement on it whatsoever.

Related issues:

@1player 1player added the bug Something isn't working label Feb 10, 2024
@1player
Copy link
Author

1player commented Feb 10, 2024

After a little more sleuthing, this is what I learned: The issue this happens is because the subprocess does not receive a SIGHUP. It does not receive it because it is not associated with a TTY.

Here's top launched on the host:

~ % ps -def | grep top
sph        16526   16486  1 10:36 pts/6    00:00:00 top

Notice that it is attached to pts/6. Here's top launched in a container:

~ % ps -def | grep top
sph        17000   16958  3 10:37 ?        00:00:00 top

On the host, when the terminal is closed, the pty is closed too, which causes the kernel to send a SIGHUP signal to the processes attached to that tty. On the container, there is no tty associated, so no SIGHUP is sent and the process keeps running forever in the background. This seems to be because container processes are children of conmon, which is itself not associated to any tty.

This seems to point to a podman issue, but maintainers over there say it's not...

@1player 1player closed this as completed Feb 10, 2024
@1player 1player reopened this Feb 10, 2024
@travier travier added the f39 Related to Fedora 39 label Feb 12, 2024
@travier
Copy link
Member

travier commented Feb 12, 2024

I think that I can reproduce this issue with both Prompt & Konsole if I close the whole app with multiple tabs open.

I can kill the processes sending them SIGTERM.
Edit: Looks like it's not the case for all instances. 🤔
Edit (again): Terminating the container killed them.

I had never seen that before as I usually close each terminal individually (exit or ^D).

@travier travier added upstream Issue reported, fixed or related to upstream projects kinoite Also affect Fedora Kinoite labels Feb 12, 2024
@1player
Copy link
Author

1player commented Feb 12, 2024

I can also reproduce with Wezterm and GNOME Terminal. I am pretty sure it's a podman issue, though not really a bug, but it's a mismatch of how they use conmon to run containers, which is a mismatch with GUI usage, so podman and toolbox developers need to come together to find a solution, hence asking for your help @travier :)

If you exit normally, that's not a problem, because the shell process itself exits. The issue is when you close a terminal that has running processes, which in my case seems to happen a lot. If I have a terminal open with 3 tabs inside the container (thus 3 shells), I'm not going to Ctrl-D each of them, because I have always relied to the SIGHUP mechanism that's core to UNIX and Linux: close the terminal, everything spawned therein gets killed, unless it was run with nohup.

In fact this is a good way to summarize the issue: right now, every command run inside a podman container, starting from the shell process, behaves as if it was started with nohup. It is a massive gotcha one needs to be aware of.

@p1gp1g
Copy link

p1gp1g commented Mar 24, 2024

I am facing the same issue.

I have a keyboard shortcut that opens a terminal that (to put it simply) runs toolbox run -- zellij, I end up with dozens of this process that I have to kill.

Do you have any workaround ?

@1player
Copy link
Author

1player commented Mar 24, 2024 via email

@travier travier added the f40 Related to Fedora 40 label Mar 25, 2024
@miabbott
Copy link
Member

@debarshiray FYI

@debarshiray
Copy link

debarshiray commented Mar 28, 2024

This sounds a lot like the discussion in this Toolbx pull request:
containers/toolbox#1207

It's been on my list to revisit, but keeps slipping. I will try to pick it up again next week. In the meantime, feel free to leave a comment if you agree, disagree or have some thoughts on the discussion.

One or two years ago, there was a problem with Podman container processes blocking shutdown:

... and that might have been fixed by:
containers/podman#17025

Note that while containers/podman#14531 points at containers/podman#16785 , I don't think it had anything to do with fixing blocked shutdowns caused by lingering podman exec sessions.

Just to be sure, this problem here isn't the blocked shutdown issue, right?

@1player
Copy link
Author

1player commented Mar 28, 2024 via email

@travier
Copy link
Member

travier commented Mar 29, 2024

This sounds a lot like the discussion in this Toolbx pull request: containers/toolbox#1207

I don't think it's the same issue, and I don't think we want to kill the entire container when one instance of toolbox terminates.

I usually have a single toolbox instance and multiple Konsole tabs entered into this same container so I don't want the container to be stopped when I close one tab.

@owtaylor
Copy link

owtaylor commented Mar 29, 2024

I think that this is pretty clearly covered containers/podman#19486 - 'podman exec' doesn't proxy signals. That's also mentioned in containers/toolbox#1400, but it's hidden in the discussion

There's a behavior difference between zsh and bash - a bash shell in a toolbox isn't leaked when the terminal window is closed, but if I 'exec zsh' in the bash shell, and then close the terminal, the zsh process is leaked. It would take some detective work to figure out what is the difference in what they are doing. [EDIT: unsure - I'm not reproducing the problem with bash. Not sure if my testing earlier was inaccurate, or there's another factor involved]

Christian Hergert has has plans to make https://gitlab.gnome.org/chergert/ptyxis avoid allocating an extra tty in this situation. But that hasn't been implemented yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working f39 Related to Fedora 39 f40 Related to Fedora 40 kinoite Also affect Fedora Kinoite upstream Issue reported, fixed or related to upstream projects
Projects
None yet
Development

No branches or pull requests

6 participants