Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gzserver not dying after killing the simulation #751

Open
musamarcusso opened this issue Jun 23, 2018 · 17 comments · May be fixed by #1452
Open

gzserver not dying after killing the simulation #751

musamarcusso opened this issue Jun 23, 2018 · 17 comments · May be fixed by #1452
Assignees
Labels

Comments

@musamarcusso
Copy link

Hello everyone,

I have noticed that ever since updating to Gazebo 9.1, when I start Gazebo with roslaunch and then kill the simulation, gzserver does not die (and gzclient sometimes also lingers). I don't know if it has to do with the version of Gazebo, but I just noticed it has been happening since the update.
I have been starting the simulation multiple times with an optimizer, so I noticed that that happens a lot. I can log how many times, but I would estimate 30% of the times the simulation starts.
Has someone noticed that already? Any ideas on how to solve this? Thanks in advance.

@romainreignier
Copy link

I have also noticed that behavior. And if I wait a bit, gzserver seems to disappear. But of course it does work if I relaunch the simulation just after I've killed it. So I have to killall -9 gzserver

Not sure but may be related with c6d6c76 ?

@musamarcusso
Copy link
Author

Yes, this didn't happen before with Gazebo 7.0 for me. I noticed this also affects my ROS tests if I have a number of them starting the simulation. I had to set different ports for the Gazebo instances in each one of the tests to be sure they always run without having the error that an instance of gzserver is already running.

@kev-the-dev
Copy link
Collaborator

kev-the-dev commented Jul 6, 2018

+1 gzserver does seem to take an incredibly long time to shutdown sometimes. Having more plugins/models/a gzclient running makes this all take longer.

However, I think there may be an actual bug here, perhaps a deadlock involving the ROS plugins, as I can produce a case where gzserver seems to hang forever (waited for 10+ minutes). I also noticed that SIGTERM (sent by kill <pid>) seems to work in these cases.

Here are some rambling notes for anyone trying to debug this deadlock:

  1. Gazebo has its own signal handlers for both SIGINT (sent by Ctrl+C) and SIGTERM
  2. Gazebo ros api api plugin has its own SIGINT callback
  3. Would be helpful to run in debugger rosrun gazebo_ros debug and see what the threads are doing while in this deadlock

Thanks for filing the issue for this, I'm sure many people have had this problem too.

@kev-the-dev kev-the-dev added the bug label Jul 6, 2018
@kev-the-dev kev-the-dev self-assigned this Jul 6, 2018
@musamarcusso
Copy link
Author

Hi @ironmig, any new updates from this issue?

@kev-the-dev
Copy link
Collaborator

I spent a little time on this a few weeks back but haven't found anything.

@musamarcusso
Copy link
Author

I also haven't figure out exactly what happens there. Was there an issue with the old script?

@kev-the-dev
Copy link
Collaborator

I don't think this is related to the script. To check, try manually sending SIGINT to gzserver

ps aux | grep gzserver
kill -2 <pid associated with gzserver>

For me this still doesn't work.

Looking at GDB, mine seems to get stuck at Publisher::fini() within gazebo. It seems to be destroying hundreds of publishers and waiting the full 1 second timeout for each one. Related to this gazebo issue. Of course, it's hard to tell if we're all having the same problem

@tahsinkose
Copy link

tahsinkose commented Aug 7, 2018

I have been dealing with this issue for roughly ten months, as well. Since then, I applied a manual kill command after the end of each simulation to clear residual Gazebo processes. Therefore, I have written a simple Bash script that checks for residual Gazebo processes at each simulation startup. With that, I'm now able to automatically clear any gzserver and gzclient processes before the execution of new simulation. If you wonder, here is a link to the gist.

This is not a direct fix to the bug mentioned above, but only a workaround. In my self projects, this did really have a boost effect in terms of faster feature development, debugging and etc. You are free to use until the core issue will be resolved!

EDIT: Link is corrected.

@josephcoombe
Copy link

@tahsinkose I tried to follow link in your comment, but it was broken.

@tahsinkose
Copy link

@josephcoombe Uh, sorry for the broken link. Just a typo. Here is the correct link.

@mjcarroll
Copy link

I just got this PR merged: https://bitbucket.org/osrf/gazebo/pull-requests/3014/wip-address-gzserver-shutdown-speed/diff

It should address some of the issues with long shutdown times with Gazebo.

@ahmetsaglam
Copy link

As a newbie to both Ubuntu and Gazebo, I realized that after killing the simulation, using top, I can see gzserver is still running. Even I tried killall gzserver, it did not shut down. Then, I noticed apport (debugging program for Ubuntu) was consuming a lot of CPU power to collect the crash report for Gazebo-shutdown process, and it did not allow me to kill gzserver. After the crash report was ready (apport's job was done), gzserver was killed. I know this is not a fix why Gazebo crashes after the shutdown but at least it may save some time for new users to figure out what is going on when "killall gzserver" does not seem to "work".

@cosmicog
Copy link

cosmicog commented May 29, 2020

Yo, ros devs;

Since I have no patience waiting our precious simulator Gazebo to shutdown and, in order to open it back with all other ros nodes, I inspected it a bit to find a way to kill it properly. Since we most probably won't be running any other ros nodes while the sim is closed, this is my way to shut it down. I'm assuming 99% of the time, Gazebo is launched with roslaunch(opening roscore automatically).

If I only kill gzserver and gzclient, I still can get these two;

 /gazebo
 /gazebo_gui

when I run rosnode list. While these are somehow awake, I see a weird behaviour, and cannot run any other roscore. Also rosnode kill -a have no effect on these nodes. rosnode info /gazebo outputs topic connections but says: "Communication with node[...] failed!" at the end of the output.

Anyway, without wasting more words, I now use [Ctrl] + [C] + this alias to assassinate it properly without sending any extra signals or using sudo:

alias killg='killall gzclient && killall gzserver && killall rosmaster'

@Sihoj
Copy link

Sihoj commented Dec 15, 2020

Having the same problem and also having no patience, I made a small Python launcher, that intercepts [Ctrl] + [C] and issues the kill commands after a small timeout.

Save the Python code below as e.g. gzlauncher and make it executable chmod +x gzlauncher. I also added it to my PATH, so I can run commands like this from anywhere:
gzlauncher roslaunch my_package my_launch
or
gzlauncher rosrun my_package my_node
and use Ctrl+C as usual to fully kill Gazebo so it's immediately ready to relaunch.

Here's the Python code (feel free to use and adapt as you like):

#!/usr/bin/env python

import sys, signal, subprocess, time


timeout_before_kill = 1.0  # [s]
timeout_after_kill = 1.0  # [s]


def signal_handler(sig, frame):
    time.sleep(timeout_before_kill)
    subprocess.call("killall -q gzclient & killall -q gzserver", shell=True)
    time.sleep(timeout_after_kill)
    subprocess.call("killall -9 -q gzclient & killall -9 -q gzserver", shell=True)
    sys.exit(0)


if __name__ == "__main__":
    signal.signal(signal.SIGINT, signal_handler)
    cmd = ' '.join(sys.argv[1:])
    subprocess.call(cmd, shell=True)

@adityapande-1995
Copy link

#1376 should fix it

@prathameshsphulpagar
Copy link

  1. killall gzserver
  2. sudo pkill gzserver
  3. if thease both are not working then
    i) open new terminal and type "htop"
    then find "gzserver" and kill manually

@Mechazo11
Copy link

I have been dealing with this issue for roughly ten months, as well. Since then, I applied a manual kill command after the end of each simulation to clear residual Gazebo processes. Therefore, I have written a simple Bash script that checks for residual Gazebo processes at each simulation startup. With that, I'm now able to automatically clear any gzserver and gzclient processes before the execution of new simulation. If you wonder, here is a link to the gist.

This is not a direct fix to the bug mentioned above, but only a workaround. In my self projects, this did really have a boost effect in terms of faster feature development, debugging and etc. You are free to use until the core issue will be resolved!

EDIT: Link is corrected.

The link is broken at the time of writing this comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet