← Back to context

Comment by busterarm

4 years ago

> Kubernetes can stack many containers per physical host, with widely varied workloads cooperatively sharing.

Until you learn the downsides of this approach at hyper scale, namely that containers and a shared kernel mean that all of your workloads are sharing the same kernel parameters, including things like network limits and timeouts and file handle limits. Multitenancy and containers actually ends up working against you and creates new problems and knobs in your individual jobs that you have to configure -- to the point that it's almost worth just having different types of jobs run on different isolated node pools and eliminating your multitenancy issue anyway.

Companies that scaled on KVM never had to learn about these limitations and just focused on what their hardware was capable of in aggregate.

At hyper scale and with multitenancy, microVMs are always going to be the end state -- and while there's k8s support for this, it's far from the default or even most convenient option.

Network limits and timeouts aren't different between kubernetes hosts and non-Kubernetes hosts. Network resources are a real resource, and you may need to implement quality of service or custom resources (a new feature [1], and one that is late to the party).

File handle limits are something no sane workload ever encounters. They are technically a shared resource, but in a sensible kubernetes configuration, it is impossible to hit because the ulimits on each process are low enough. A very small number of teams may need an exception, with good reason, and will typically be cordoned on to their own node classes that are specially tainted.

Yes, fleet Management via taints offers nothing over the fleet Management that you've already got. This is a good thing. Fleet Management tools are a damage to your reliability. They mean that your machines are non-fungible. Kubernetes great innovation is making machines, units of compute, fungible.

There are workloads and architectures that will never be suitable for kubernetes. There are HPC Clusters that heavily rely on things like rack-locality that Kubernetes views as damage. Getting rid of them is a net win for humanity.

[1] https://kubernetes.io/docs/concepts/configuration/manage-res...

  • > File handle limits are something no sane workload ever encounters.

    I guess databases are not a sane resource? I've seen file handle limits hit with databases more than once in my life, and that isn't specific to k8s.

At hyper scale you don't need to worry about sharing as much because the important services are far bigger than one machine. That sidesteps the problem: you can apply whatever sysctls or configs you need to do before starting the container.

"Multitenancy" here means "I have a giant pool of machines and I run a bunch of jobs across them" not "I have a pool of giant machines and I stack jobs on them".