Comment by tolciho

1 day ago

A `kill -9` will cause many a process to die and give no chance to cleanup any child processes. Some percentage of users continue to use `kill -9` by default, which may result in a mess of a process tree. Otherwise if the crash is bad enough that cleanup code cannot run (maybe it's being run on OpenBSD and an incompetent programmer didn't check the return value of a malloc and for some reason the kernel now nukes the process) then there may be orphan children. There may also be sporadic failures to cleanup if the crash, maybe, causes the whole process to exit before the cleanup code in some other thread can run. System load average may also influence how things maybe go sideways.

That depends on how the children were spawned, no? prctl(PR_SET_PDEATHSIG, SIGTERM); or similar will fix this.

TIL. I didn’t know it’s the responsibility of the parent, thought OS automatically handles child processes.

  • When a child process finishes (that is not actively being waited on) it is left in a "defunct" or "zombie" state and will stick around in the process table until the parent process waits on them to fetch exit code. When you kill a parent process with active children, these subprocesses will become orphaned and re-parented to the OS pid 1 (or another "sub-reaper" process depending on your setup).

    The OS will typically not kill orphaned/re-parented processes for you. It will simply wait/reap them so they are not left as zombies once they complete. If your parent process spawns something like a daemon server that needs an explicit signal to be stopped (e.g. SIGINT/SIGTERM), these processes will continue to run in the background until they are manually killed or they crash.

    • I see, so I might still need to hunt down non-daemon but hung processes even after I kill tmux server in which I ran them. Might explain a couple of odd occurrences in the past…