← Back to context

Comment by vadepaysa

5 hours ago

I was an on-prem maxi (if thats a thing) for a long time. I've run clusters that costed more than $5M, but these days I am a changed man. I start with PaaS like Vercel and work my way down to on-prem depending on how important and cost conscious that workload is.

Pains I faced running BIG clusters on-prem.

1. Supply chain Management -- everything from power supplies all the way to GPUs and storage has to be procured, shipped, disassembled and installed. You need labor pool and dedicated management.

2. Inventory Management -- You also need to manage inventory on hand for parts that WILL fail. You can expect 20% of your cluster to have some degree of issues on an ongoing basis

3. Networking and security -- You are on your own defending your network or have to pay a ton of money to vendors to come in and help you. Even with the simplest of storage clusters, we've had to deal with pretty sophisticated attacks.

When I ran massive clusters, I had a large team dealing with these. Obviously, with PaaS, you dont need anyone.

> I was an on-prem maxi (if thats a thing) for a long time. I've run clusters that costed more than $5M, but these days I am a changed man.

I have had a similar transformation. I still host non-critical services on-prem. They are exceptionally cheap to run. Everything else, I host it on Hetzner.

In addition to those sorts of non-first-hardware-purchase costs, the person writing the check needs to think long and hard about how bad an outage would be, and how much money it makes sense to budget simply to "avoiding outages." And the more important it is not to have any downtime, the more it's gonna cost to build up some sort of substitute for cross-datacenter cloud functionality. (You are also likely not going to be as good at either managing and configuring those networks, or hiring people to do so, as AWS, either.)