← Back to context

Comment by GauntletWizard

4 years ago

Network limits and timeouts aren't different between kubernetes hosts and non-Kubernetes hosts. Network resources are a real resource, and you may need to implement quality of service or custom resources (a new feature [1], and one that is late to the party).

File handle limits are something no sane workload ever encounters. They are technically a shared resource, but in a sensible kubernetes configuration, it is impossible to hit because the ulimits on each process are low enough. A very small number of teams may need an exception, with good reason, and will typically be cordoned on to their own node classes that are specially tainted.

Yes, fleet Management via taints offers nothing over the fleet Management that you've already got. This is a good thing. Fleet Management tools are a damage to your reliability. They mean that your machines are non-fungible. Kubernetes great innovation is making machines, units of compute, fungible.

There are workloads and architectures that will never be suitable for kubernetes. There are HPC Clusters that heavily rely on things like rack-locality that Kubernetes views as damage. Getting rid of them is a net win for humanity.

[1] https://kubernetes.io/docs/concepts/configuration/manage-res...

> File handle limits are something no sane workload ever encounters.

I guess databases are not a sane resource? I've seen file handle limits hit with databases more than once in my life, and that isn't specific to k8s.

  • databases, anyone dealing with high-performance template rendering, web-crawling, etc.

    • If your web crawler is using a hundred thousand filehandles, you've got a problem. You shouldn't need that many; You can support ten thousand open web requests, for sure, but you don't need ten filehandles for each; A few hundred connections to intermediate processors and databases where you store the scraped data.

      High performance template rendering has as many filehandles as open requests - Maybe 10,000. If it's actually high performance, the templates underneath aren't files anymore by the time you're processing, they're stored in memory.

      Databases are almost an exception, but you shouldn't be running "Large DB" on a shared host on K8s. You should taint and dedicate those machines. K8s is still useful as a common management plane, but I'm roughly on the fence of "Just run those machines as a special tier" and "Run them on k8s with dedicated taints", because both have advantages. Smaller databases run just fine. Postgres is using ~10k filehandles

      There was a time for a scheduling specifically for "filehandleful" jobs. It's long gone. Modern linux systems set the filehandle limit to something obscene, because it's no longer a limiting factor, and it hasn't been on these workloads for 5 years.