More PID collision avoidance, please #241

philipdumont · 2018-11-13T21:57:53Z

For a related issue, see https://bugzilla.kernel.org/show_bug.cgi?id=201441

The way the init scripts check for whether a server is running -- by getting a PID out of the pidfile and checking to see if a process with that PID is running -- is not foolproof. There is some attempt to reduce PID collision (the "-b binary" option), but I recently developed some init scripts where that particular collision avoidance was not effective.

My scripts had two issues that made PID collisions more likely:

They started up several servers that were all multi-threaded. /etc/init.d/functions checks to see whether a process with a PID is running by testing for the existence of a directory /proc/PID. But it would appear that /proc also has sub-directories for LWPIDs. The LWPID directories are hidden -- they are not included if you get a listing of /proc -- but nevertheless, if you access them, they are there. This only increases the likelihood of PID collision: if you are checking whether a pid from the prior system boot is still running, you could get a false positive not only from a process with the same PID this boot, but also from a thread with an LWPID that matches the old PID. That particular occurrence of PID collision could be avoided by also checking if the LWPID matches the PID. If it does, it is the main thread, and still might be the PID of your server process. If the LWPID does not match the LWP's PID, it is not a main thread and is (almost?) certainly not the PID of your process.
All my servers were java scripts. So the '-b binary' options helped not-at-all -- they all had the same binary: java. In order to avoid PID collision, I needed a way to distinguish servers from each other by using other options/arguments on their command line.

I've attached the /etc/init.d/functions that came with my system, and my mods that attempt to address these problems. I'm not entirely convinced that what I did is the ideal implementation, but I present them for your consideration. Do as you like with them.

functions.zip

lnykryn · 2018-11-13T22:14:01Z

Identification of processes based on information from /proc can't be 100% reliable. Just few weeks ago in other component we were dealing with CVE where in reproducer the attacker was able to start a process that had completely identical entries.

If this is a problem you hit often I would strongly encourage you to upgrade to Centos/rhel7 or fedora 15+ and use systemd. It uses cgroups to track the processes and it is much more reliable.

jccleaver · 2018-11-13T23:11:32Z

Factoring this into a __checkonepid() function, as done here, and checking that /proc/$pid/status matches seems like a quite reasonable approach for fixing a known-possible false-positive case.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More PID collision avoidance, please #241

More PID collision avoidance, please #241

philipdumont commented Nov 13, 2018

lnykryn commented Nov 13, 2018 •

edited

Loading

jccleaver commented Nov 13, 2018

More PID collision avoidance, please #241

More PID collision avoidance, please #241

Comments

philipdumont commented Nov 13, 2018

lnykryn commented Nov 13, 2018 • edited Loading

jccleaver commented Nov 13, 2018

lnykryn commented Nov 13, 2018 •

edited

Loading