← Back to context

Comment by bityard

12 hours ago

Yes, application containers should stick to the Unix philosophy of, "do one thing and do it well." But if the thing in your docker container forks for _any_ reason, you should have a real init on PID 1.

There's nothing inherently wrong with containers in the abstract: virtualization is a critical tool in computer science (some might it's difficult to define computer science without a virtual machine). There's not even anything wrong with this "less than a new kernel, more than a new libc" neighborhood.

The broken, ugly, malignant thing is this one godawful implementation Docker and its attic-dwelling Quasimodo cousin docker-compose.yml

It's trivial to slot namespaces (or jails if you also like the finer things BSD) into a sane init system, process id regime, network interface regime: its an exercise in choosing good defaults for all the unshare-adjacent parameters.

But a whole generation of SWEs memorized docker jank instead of Unix, and so now people are emotionally invested in it. You run compose to run docker to get Alpine and a node built on musl.

You can just link node to musl. And if you want a chroot or a new tuntap scope? man unshare.

> you should have a real init on PID 1

Got a handy list of those? My colleagues use supervisord and it kinda bugs me. Would love to know if it makes the list.

is there any issue besides the potential zombies? also, why can't the real pid1 do it? it sees all the processes after all.

  • Mostly just zombies and signal handlers.

    And your software can do it, if it's written with the assumption that it will be pid1, but most non-init software isn't. And rather than write your software to do so, it's easier to just reach for something like tini that does it already with very little overhead.

    I'd recommend reading the tini readme[0] and its linked discussion for full detail.

    [0]: https://github.com/krallin/tini

  • The main other problem is that the kernel doesn't register default signal handlers for signals like SIGTERM if the process is PID 1. So if your process doesn't register its own signal handlers, it's hard to kill (you have to use SIGKILL). I'm sure anyone who has used Docker a lot has run into containers that seem to just ignore signals -- this is the usual reason why.

    > also, why can't the real pid1 do it? it sees all the processes after all.

    How would the real PID 1 know if it _should_ reap the zombie? It's normal to have some zombie processes -- they're just processes whose exit statuses haven't been reaped yet. If you force-reaped a zombie you could break a program that just hasn't yet gotten around to checking the status of a subprocess it spawned.

    • Processes only reap their direct children. Init is special because orphaned processes are reparented to init, which then has to reap them.