← Back to context

Comment by loloquwowndueo

1 day ago

Right but incus is not k8s. You can stand up spares and switch traffic, but it’s not built in functionality and requires extra orchestration.

It is a built-in functionality [1] and requires no extra orchestration. In a cluster setup, you would be using virtualized storage (ceph based) and virtualized network (ovn). You can replace a container/VM on one host with another on a different host with the same storage volumes, network and address. This is what k8s does with pod migrations too (edit: except the address).

There are a couple of differences though. The first is the pet vs cattle treatment of containers by Incus and k8s respectively. Incus tries to resurrect dead containers as faithfully as possible. This means that Incus treats container crashes like system crashes, and its recovery involves systemd bootup inside the container (kernel too in case of VMs). This is what accounts for the delay. K8s on the other hand, doesn't care about dead containers/pods at all. It just creates another pod, likely with a different address and expects it to handle the interruption.

Another difference is the orchestration mechanism behind this. K8s, as you may be aware, uses control loops on controller nodes to detect the crash and initiate the recovery. The recovery is mediated by the kubelets on the worker nodes. Incus seems to have the orchestrator on all nodes. They take decisions based on consensus and manage the recovery process themselves.

[1] https://linuxcontainers.org/incus/docs/main/howto/cluster_ma...

  • > and address. This is what k8s does with pod migrations too.

    That's not true of Pods; each Pod has its own distinct network identity. You're correct about the network, though, since AFAIK Service and Pod CIDR are fixed for the lifespan of the k8s cluster

    You spoke to it further down, but guarded it with "likely" and I can say with certainty that it's not likely, it unconditionally does. That's not to say address re-use isn't possible over a long enough time horizon, but that bookkeeeping is delegated to the CNI

    ---

    Your "dead container" one also has some nuance, in that kubelet will for sure restart a failed container, in place, with the same network identity. When fresh identity comes into play is if the Node fails, or the control loop determines something in the Pod's configuration has changed (env-vars, resources, scheduling constraints, etc) in which case it will be recreated, even if by coincidence on the same Node

    • I agree with everything you pointed out. They were what I had in my mind too. However, I avoided those points on purpose for the sake of brevity. It was getting too long winded and convoluted for my liking. Thanks for adding a separate clarification, though.