Comment by Aurornis

1 day ago

They were running a big kubernetes infrastructure to handle all of these RPC calls.

That takes a lot of engineer hours to set up and maintain. This architecture didn't just happen, it took a lot of FTE hours to get it working and keep it that way.

But that k8s engineer's cost is spread over all the functions the cluster is doing, not just the rpc setup.

Yeah, the situation from TFA doesn't make a lot of sense; I was just highlighting that it's not as clear-cut as "costs > 1 FTE => fix it."

  • Yep. Opportunity cost is the importantly thing. Though a well-managed org will scale capacity against some ROI threshold.

    If you’re skipping 8 $300k projects a year that could be done by one fully-burdened $400k developer, something is wrong.

Kube is trivial to run. You hit a few switches on GKE/EKS and then a few simple configs. It doesn't take very many engineer hours to run. Infrastructure these days is trivial to operate. As an example, I run a datacenter cluster myself for a micro-SaaS in the process of SOC2 Type 2 compliance. The infra itself is pretty reliable. I had to run some power-kill sims before I traveled and it came back A+. With GKE/EKS this is even easier.

Over the years of running these I think the key is to keep the cluster config manual and then you just deploy your YAMLs from a repo with hydration of secrets or whatever.