Comment by nostrebored

3 years ago

As someone who's worked in cloud sales and no longer has any skin in the game, I've seen firsthand how cloud native architectures improve developer velocity, offer enhanced reliability and availability, and actually decrease lock-in over time.

Every customer I worked with who had one of these huge servers introduced coupling and state in some unpleasant way. They were locked in to persisted state, and couldn't scale out to handle variable load even if they wanted to. Beyond that, hardware utilization became contentious at any mid-enterprise scale. Everyone views the resource pool as theirs, and organizational initiatives often push people towards consuming the same types of resources.

When it came time to scale out or do international expansion, every single one of my customers who had adopted this strategy had assumptions baked into their access patterns that made sense given their single server. When it came time to store some part of the state in a way that made sense for geographically distributed consumers, it was months not sprints of time spent figuring out how to hammer this in to a model that's fundamentally at odds.

From a reliability and availability standpoint, I'd often see customers tell me that 'we're highly available within a single data center' or 'we're split across X data centers' without considering the shared failure modes that each of these data centers had. Would a fiber outage knock out both of your DCs? Would a natural disaster likely knock something over? How about _power grids_? People often don't realize the failure modes they've already accepted.

This is obviously not true for every workload. It's tech, there are tradeoffs you're making. But I would strongly caution any company that expects large growth against sitting on a single-server model for very long.

Could confirmation bias affect your analysis at all?

How many companies went cloud-first and then ran out of money? You wouldn't necessary know anything about them.

Were the scaling problems your single-server customers called you to solve unpleasant enough put their core business in danger? Or was the expense just a rounding error for them?

  • From this and the other comment, it looks like I wasn't clear about talking about SMB/ME rather than a seed/pre-seed startup, which I understand can be confusing given that we're on HN.

    I can tell you that I've never seen a company run out of money from going cloud-first (sample size of over 200 that I worked with directly). I did see multiple businesses scale down their consumption to near-zero and ride out the pandemic.

    The answer to scaling problems being unpleasant enough to put the business in danger is yes, but that was also during the pandemic when companies needed to make pivots to slightly different markets. Doing this was often unaffordable from an implementation cost perspective at the time when it had to happen. I've seen acquisitions fall through due to an inability to meet technical requirements because of stateful monstrosities. I've also seen top-line revenue get severely impacted when resource contention causes outages.

    The only times I've seen 'cloud-native' truly backfire were when companies didn't have the technical experience to move forward with these initiatives in-house. There are a lot of partners in the cloud implementation ecosystem who will fleece you for everything you have. One such example was a k8s microservices shop with a single contract developer managing the infra and a partner doing the heavy lifting. The partner gave them the spiel on how cloud-native provides flexibility and allows for reduced opex and the customer was very into it. They stored images in a RDBMS. Their database costs were almost 10% of the company's operating expenses by the time the customer noticed that something was wrong.

The common element in the above is scaling and reliability. While lots of startups and companies are focused on the 1% chance that they are the next Google or Shopify, the reality is that nearly all aren't, and the overengineering and redundancy-first model that cloud pushes does cost them a lot of runway.

It's even less useful for large companies; there is no world in which Kellogg is going to increase sales by 100x, or even 10x.

  • But most companies aren't startups. Many companies are established, growing businesses with a need to be able to easily implement new initiatives and products.

    The benefits of cloud for LE are completely different. I'm happy to break down why, but I addressed the smb and mid-enterprise space here because most large enterprises already know they shouldn't run on a single rack.

    • > I addressed the smb and mid-enterprise space here because most large enterprises already know they shouldn't run on a single rack.

      This is a straw man. No one, anywhere in this thread or in the OPs original article proposed a single-rack solution.

      From the OP: > Running a primary and a backup server is usually enough, keeping them in different datacenters.

      1 reply →