Comment by bityard

6 months ago

Yes, application containers should stick to the Unix philosophy of, "do one thing and do it well." But if the thing in your docker container forks for _any_ reason, you should have a real init on PID 1.

8 comments

bityard

benreesman 6 months ago

There's nothing inherently wrong with containers in the abstract: virtualization is a critical tool in computer science (some might it's difficult to define computer science without a virtual machine). There's not even anything wrong with this "less than a new kernel, more than a new libc" neighborhood.

The broken, ugly, malignant thing is this one godawful implementation Docker and its attic-dwelling Quasimodo cousin docker-compose.yml

It's trivial to slot namespaces (or jails if you also like the finer things BSD) into a sane init system, process id regime, network interface regime: its an exercise in choosing good defaults for all the unshare-adjacent parameters.

But a whole generation of SWEs memorized docker jank instead of Unix, and so now people are emotionally invested in it. You run compose to run docker to get Alpine and a node built on musl.

You can just link node to musl. And if you want a chroot or a new tuntap scope? man unshare.

RulerOf 6 months ago

> you should have a real init on PID 1

Got a handy list of those? My colleagues use supervisord and it kinda bugs me. Would love to know if it makes the list.

bityard 6 months ago
If all you need is init (and not a process supervisor), docker comes with one called 'tini' built in. All you have to do is supply `--init` to the `docker run` command. Or use `init: true` in your docker-compose.yaml.
As far as a different process supervisor, I'm not sure. I've used supervisord and agree it's kind of awkward. I have heard of these but don't know much about them:
https://smarden.org/runit/
https://github.com/nicolas-van/multirun
https://github.com/just-containers/s6-overlay
- RulerOf 6 months ago
  
  I'm a fan of s6 after getting exposed to it through the Linuxserver.io project... but I'm not certain it's appropriate when you're using an orchestrator like k8s.
  Take health checks for example. I can't decide—in principle—if they should live at the highest level, the lowest level, or every level of the stack. Or if they are everywhere, should there be automated remediation with exponential trigger times... stuff like that. S6 and supervisord would be good for that. But higher-level remediation steps make something simpler more appealing.

pas 6 months ago

is there any issue besides the potential zombies? also, why can't the real pid1 do it? it sees all the processes after all.

MyOutfitIsVague 6 months ago

Mostly just zombies and signal handlers.
And your software can do it, if it's written with the assumption that it will be pid1, but most non-init software isn't. And rather than write your software to do so, it's easier to just reach for something like tini that does it already with very little overhead.
I'd recommend reading the tini readme[0] and its linked discussion for full detail.
[0]: https://github.com/krallin/tini
dathery 6 months ago
The main other problem is that the kernel doesn't register default signal handlers for signals like SIGTERM if the process is PID 1. So if your process doesn't register its own signal handlers, it's hard to kill (you have to use SIGKILL). I'm sure anyone who has used Docker a lot has run into containers that seem to just ignore signals -- this is the usual reason why.
> also, why can't the real pid1 do it? it sees all the processes after all.
How would the real PID 1 know if it _should_ reap the zombie? It's normal to have some zombie processes -- they're just processes whose exit statuses haven't been reaped yet. If you force-reaped a zombie you could break a program that just hasn't yet gotten around to checking the status of a subprocess it spawned.
- immibis 6 months ago
  
  Processes only reap their direct children. Init is special because orphaned processes are reparented to init, which then has to reap them.