Comment by andrewstuart2
16 hours ago
I'm always torn when I see anything mentioning running an init system in a container. On one hand, I guess it's good that it's designed with that use case in mind. Mainly, though, I've just seen too many overly complicated things attempted (on greenfield even) inside a single container when they should have instead been designed for kubernetes/cloud/whatever-they-run-on directly and more properly decoupled.
It's probably just one of those "people are going to do it anyway" things. But I'm not sure if it's better to "do it better" and risk spreading the problem, or leave people with older solutions that fail harder.
Yes, application containers should stick to the Unix philosophy of, "do one thing and do it well." But if the thing in your docker container forks for _any_ reason, you should have a real init on PID 1.
There's nothing inherently wrong with containers in the abstract: virtualization is a critical tool in computer science (some might it's difficult to define computer science without a virtual machine). There's not even anything wrong with this "less than a new kernel, more than a new libc" neighborhood.
The broken, ugly, malignant thing is this one godawful implementation Docker and its attic-dwelling Quasimodo cousin docker-compose.yml
It's trivial to slot namespaces (or jails if you also like the finer things BSD) into a sane init system, process id regime, network interface regime: its an exercise in choosing good defaults for all the unshare-adjacent parameters.
But a whole generation of SWEs memorized docker jank instead of Unix, and so now people are emotionally invested in it. You run compose to run docker to get Alpine and a node built on musl.
You can just link node to musl. And if you want a chroot or a new tuntap scope? man unshare.
> you should have a real init on PID 1
Got a handy list of those? My colleagues use supervisord and it kinda bugs me. Would love to know if it makes the list.
is there any issue besides the potential zombies? also, why can't the real pid1 do it? it sees all the processes after all.
Mostly just zombies and signal handlers.
And your software can do it, if it's written with the assumption that it will be pid1, but most non-init software isn't. And rather than write your software to do so, it's easier to just reach for something like tini that does it already with very little overhead.
I'd recommend reading the tini readme[0] and its linked discussion for full detail.
[0]: https://github.com/krallin/tini
The main other problem is that the kernel doesn't register default signal handlers for signals like SIGTERM if the process is PID 1. So if your process doesn't register its own signal handlers, it's hard to kill (you have to use SIGKILL). I'm sure anyone who has used Docker a lot has run into containers that seem to just ignore signals -- this is the usual reason why.
> also, why can't the real pid1 do it? it sees all the processes after all.
How would the real PID 1 know if it _should_ reap the zombie? It's normal to have some zombie processes -- they're just processes whose exit statuses haven't been reaped yet. If you force-reaped a zombie you could break a program that just hasn't yet gotten around to checking the status of a subprocess it spawned.
1 reply →
From my experience in the robotics space, a lot of containers start life as "this used to be a bare metal thing and then we moved it into a container", and with a lot of unstructured RPC going on between processes, there's little benefit in breaking up the processes into separate containers.
Supervisor, runit, systemd, even a tmux session are all popular options for how to run a bunch of stuff in a monolithic "app" container.
My experience in the robotics space is that containers are a way to not know how to put a system together properly. It's the quick equivalent of "I install it on my Ubuntu, then I clone my whole system into a .iso and I call that a distribution". Most of the time distributed without any consideration for the open source licences being part of it.
I've always advocated against containers as a means of deploying software to robots simply because to my mind it doesn't make sense— robots are full of bare-metal concerns, whether it's udev rules, device drivers, network config, special kernel or bootloader setup, never mind managing the container runtime itself including startup, updating, credentials, and all the rest of it. It's always felt to me like by the time you put in place mechanisms to handle all that crap outside the container, you might as well just be building a custom bare metal image and shipping that— have A/B partitions so you copy an update from the network to the other partition, use grub chainloading, wipe hands on pants.
The concern regarding license-adherence is orthogonal to all that but certainly valid. I think with the ROS ecosystem in particular there is a lot of "lol everything is BSD/Apache2 so we don't even have to think about it", without understanding that these licenses still have an attribution requirement.
1 reply →
> Supervisor, runit, systemd, even a tmux session are all popular options for how to run a bunch of stuff in a monolithic "app" container.
Did docker+systemd get fixed at some point? I would be surprised to hear that it was popular given the hoops you had to jump through last time I looked at it
It's only really fixed in podman, with the special `--systemd=always` flag. Docker afaik still requires manually disabling certain services that will conflict with the host and then running the whole thing as privileged— basically, a mess.
tmux?! Please share your war stories.
Not my favoured approach, but for early stage systems where proper off-board observability/alerting is not yet in place, tmux can function as a kind of ssh-accessible dashboard displaying the stdout of key running processes, and also allowing some measure of inline recovery— like if a process has crashed, you can up-arrow and relaunch it in the same environment it crashed out of.
Obviously not an approach that scales, but I think it can also work decently well as a dev environment, where you want to run "stock" for most of the components in the system, and just be syncing in an updated workspace and restarting the one bit being actively developed on. Being able to do this without having to reason about a whole tree of interlinked startup units or whatever does lower the barrier to entry somewhat.
1 reply →
I've used several hosting providers that charge by the container - Fly.io and Render and Google Cloud Run.
I often find myself wanting to run more than one process in s container for pricing reasons.