Comment by thyristan

9 days ago

I've used Solaris SMF back in the day, which is what systemd largely copied, but badly.

The major difference is, in SMF you specify services, their dependencies and whatever else you need. Then SMF looks at the whole desired system, computes the dependency tree and boot order, validates everything and writes a boot plan that can be executed in a static and deterministic fashion. You can look at the plan, execute that exact same plan each boot. You therefore know that if the system came up once using that plan it will again.

Whereas systemd does everything on the fly, the boot order is computed anew with each boot, all the services on the same tree level are started at the same time and just have to fight it out. Thus the actual boot order varies greatly with each boot and will fail randomly, wait randomly (hands up: who wasn't annoyed by the systemd-rememberance-minute-and-a-half popping up like an inopportune windows update?) and will be hard to debug because you just cannot reproduce anything that lead to that. Your boot will be littered with 'wait-for-something' units, 'sleep-for-10s-because-that-magically-makes-it-work' and additional dependencies to maybe clean up the tree which the maintainers forgot about. Sometimes intentionally because "systemd is about making it boot fast, and it works on my laptop".

It isn't all bad, but for systems that I need to boot and shutdown reliably, systemd is a step back even over init scripts because of the nondeterminism. You can "fix" things by just watchdog-rebooting when the system doesn't come up, and the downtime will just be a little longer. But it is annoying. And it is a design bug that will never be fixed.

Systemd has more in common with launchd than SMF to be fair. SMF very much is a product of another time. Completely fine when we were dealing with workstations whose configuration was mostly static and not expected to change but very much unfit to the environment systemd operates nowadays.

Systemd uses a dependency graph by the way. There is little magic. Provided you have properly setup what depends on what, everything will boot just fine and it’s somehow resilient enough than even if you messed up somewhere there is a good chance it manages to bring everything up after some hiccups.

  • OTOH figuring things out at runtime seems to be how it paints itself into corners and just... stops. I'm not going to pretend that sysv was good, but when a sysv box booted, it came up, and when I told it to shut down, it died on the spot; there was none of this sitting staring at it twiddling its thumbs. On the bright side, these days things do eventually time out; I don't miss CentOS 7 freezing forever on shutdown.

  • Regarding "there is little magic", Poettering repeatedly praises socket activation in his blog. It's not a new idea, but an old one that fell out of favor because it kinda sucked (remember xinetd?). It's about pretending that a service is running when by providing its listening socket, then starting it when the first client connects.

    It has some problems: it's more complicated, it can actually slow things down compared to starting a service as soon as possible, and it has new error cases because of course "pretending that a service is running" doesn't have exactly the same effect as actually having the service running.

    • Socket activation is entirely optional and is not what I would personally call "magic".

      I don't think "pretending that a service is running" is a fair description of what it does however. It's just buffering. It has advantages in some case which is why it exists. It doesn't "kinda suck".

      1 reply →

You are not crazy. The fact that we managed to make a livable space in and pull electricity and utilities into the shanty-town-systemd we all live in nowadays is not proof of great design.