On Running systemd-nspawn Containers (2022)

1 day ago (benjamintoll.com)

I used nspawn to get a system running in the most ridiculous way.

A debian aarch64 vm on kvm starting a systemd-nspawn for an unpacked raspberry pi 3 iso.

It works way too well judging by how ridiculous it was.

Still saved me a few days instead of setting things up myself.

I actually liked how easy it is to spin up nspawn as a systemd service

  [Unit]
  Description=Raspberry Image Machine
  After=multi-user.target

  [Service]
  Type=simple
  User=root

  ExecStart=/usr/bin/systemd-nspawn -D /mnt/ /sbin/init

  [Install]
  WantedBy=multi-user.target

  • You might want to look into .nspawn files instead. Then you can also manage your nspawn-containers with the machinectl command.

    See man 5 systemd.nspawn

    And many command like systemctl and journalctl accept the -M parameter, which allows you to query systemd units inside your nspawn-containers from the host.

    edit: The article actually explains all of these things in more detail.

  • I used to use qemu-user-static to run ARM Linux distros like Buildroot, Yocto, and Raspbian on x88_64. It worked surprisingly well! Outside of some minor bugs here and there, it was perfect for local development, emulating an embedded system I was working on.

  • hmm this is very interesting.

    I am wondering though? Is there something like systemd-nspawn that doesn't require root?

> Unfortunately, though, most developers don’t even know that there are options outside of Docker, or that they’re not as “convenient”.

> Hopefully, this article has disabused some of that notion.

If that was the goal, it seems terribly complicated when compared with podman.

  • I was thinking similarly. All of those steps to circumvent the OCI image infrastructure just to use systemd…

    • OCI is for running prepackaged software in black boxes from the internet, where you have no interest or ownership of the container internals.

      Most of my containers are not like that. Well, actually none are.

      systemd-nspawn is for running your own containers, with a VM-like usage pattern (ie not immutable), deployed as part of your overall systemd based infrastructure for when the thing you need to manage is "too big" to be deployed as its own systemd-service unit, but you still want to be able "to systemd" it.

      This fits my use-case perfectly.

      4 replies →

  • Author should consider running it inside Docker for more convenient setup.

    • Never. If he wanted to go the containers route, Podman is there. There is no reason to use Docker anymore. (Only a satellite tool like docker-compose is not 1-1 compatible with podman-compose, but podman has other ways to orchestrate with systemd as part of podman vision for orchestrating.)

I recently ran into them and honestly they seem unnecessarily complicated compared to using Podman and OCI images.

I use nspawn but many of the helpers featured here are new, so I appreciate this article. I've only ever booted from directories rather than images, and wasn't aware that an image could mount its own partitions, even swap!

Also I'm a little unclear on the security implications of "--private-users=id". Yes the user IDs are the same, but it is technically running in a separate user namespace. In terms of security is this mode equivalent to privileged containers, or is it safer?

Redhat's Leapp, for upgrading between major releases of RHEL, uses systemd-nspawn to create a container where it can test installing the packages without interfering with the running OS.

There are lot of ridiculous things in systemd (I'll avoid mentioning specific things to avoid a flame war), but auto containerization of services is by far the most useful thing they've ever come out with. It's a far easier workflow than docker or anything else and is built in "for free"

I've used lots of different container-types over the years to replace VMs with lightweight containers, but right now I'm running systemd-nspawn, and I really, really like it.

The way it integrates with systemd, both inside and outside the container makes it a no-brainer for app-isolation when the app in question is a bit too complex for just being a service-unit in itself, and you don't want to lose observability by hiding everything behind some obscure docker wall.

The way everything integrates into systemctl and you can get aggregated stats for your entire machine and all its sub-containers... Amazingly nice.

I just can't imagine any better way of managing containers on a Linux system than this.

Only thing I would complain about is the name. They really could have come up with something a bit more catchy or self-descriptive. This is probably the only systemd type service which does not immediately shout out what its about, so most people are probably not even aware that systemd can manage containers for you.

This is very interesting! I only heard about systemd-nspawn last night.

  • Most systemd-projects have a name which immediately shouts out what it does, so you can easily tell if it is relevant for your needs or not.

    systemd-nspawn is probably the only project without such a name, so most people don't know about it, nor what it does, and therefore never looks any more into it.

    And that's a shame really, because it's fantastic technology.

    • How so? nspawn means spawn a process in a new namespace, which is... exactly what it does. The problem isn't with systemd-nspawn, the problem is with containers, because the vast majority of devs have no idea that containers are just scripts to set up Linux namespaces.

    • > systemd-nspawn is probably the only project without such a name

      Add sd-tmpfiles to the list IMO. While it still create and manages temporary files its more managing almost any type of system file. From creating them to managing their permissions or making symlinks when needed.

      I am a strong advocator of renaming it systemd-sysfiles to match the systemd-sysusers which is somewhat related (e.g. tmpfiles using users created from sysusers). But it probably won't happen for a while if at all due to backwards compat.

It's really one of those little gems not very many people know about or use, but it seems from the responses that is changing.

As Brendan Gregg said: "Containers are just processes, cgroups, and namespaces."

  • Dockerfiles are just a really nice, standard way of specifying them, along with ports, networks and persistent storage.