← Back to context

Comment by likpok

4 years ago

You wouldn't want to run one giant cluster, but at hyper scale you're talking about running thousands or tens of thousands of kubernetes clusters. That's the part that doesn't scale well, for a couple of reasons.

The biggest one is just mechanical: with that many clusters it will be hard to move capacity between clusters, and locality gets baked into everything you do (people do try to build around this, but it's awkward).

If each service or team runs their own kubernetes that's a lot of overhead: kubernetes will need something like 6-7 machines for the cluster (I don't have production experience with kube, spitballing here), so small teams or jobs will have terrible efficiency. Big teams will have to spend a lot of operational effort to manage their fleets.

It's worth noting that at hyperscale there will be individual jobs in a datacenter that are bigger than kubernetes handles comfortably. Handling this efficiently becomes very important, it's literally billions of dollars worth of hardware.