Comment by llm_nerd

14 hours ago

This piece is written with a pretty cliche dismissive tone that assumes that everything everyone else does is driven by cargo-culting if not outright ignorance. That people make these choices because they're just rushing to chase the latest trend.

They're just trying to be cool, you see.

Here's the thing, though: Almost every choice that leads to scalability also leads to reliability. These two patterns are effectively interchangeable. Having your infra costs be "$100 per month" (a claim that usually comes with a massive disclaimer, as an aside) but then falling over for a day because your DB server crashed is a really, really bad place to be.

11 comments

llm_nerd

crazygringo 14 hours ago

> Almost every choice that leads to scalability also leads to reliability.

Empirically, that does not seem to be the case. Large scalable systems also go offline for hours at a time. There are so many more potential points of failure due to the complexity.

And even with a single regular server, it's very easy to keep a live replica backup of the database and point to that if the main one goes down. Which is a common practice. That's not scaling, just redundancy.

llm_nerd 14 hours ago
>Empirically, that does not seem to be the case.
Failures are astonishingly, vanishingly rare. Like it's amazing at this point how reliable almost every system is. There are a tiny number of failures at enormous scale operations (almost always due to network misconfigurations, FWIW), but in the grand scheme of things we've architected an outrageously reliable set of platforms.
>That's not scaling, just redundancy.
In practice it almost always is scaling. No one wants to pay for a whole n server just to apply shipped logs to. I mean, the whole premise of this article is that you should get the most out of your spend, so in that case much better is two hot servers. And once you have two hot...why not four, distributed across data centers. And so on.
- crazygringo 14 hours ago
  
  > Failures are astonishingly, vanishingly rare
  You and I must be using different sites and different clouds.
  There's a reason isitdownrightnow.com exists. And why HN'ers are always complaining about service status pages being hosted on the same services.
  By your logic, AWS and Azure should fail once in a millennium, yet they regularly bring down large chunks of the internet.
  Literally last week: https://cyberpress.org/microsoft-azure-faces-global-outage-i...
  
  2 replies →

sgarland 13 hours ago

A distributed monolith - which is what nearly all places claiming to run microservices actually have - has N^m uptime.

Even if you do truly have a microservices architecture, you’ve also now introduced a great deal of complexity, and unless you have some extremely competent infra / SRE folk on staff, that’s going to bite you. I have seen this over and over and over again.

People make these choices because they don’t understand computing fundamentals, let alone distributed systems, but the Medium blogs and ChatGPT have assured them that they do.

dinkleberg 8 hours ago

This is the truth. I work with an application that has nearly 100 microservices and it seems like at any given point in time at least one is busted. Is it going to impact what you’re doing? Maybe. Maybe not.
But if it was just a monolith and had proper startup checks, when they roll out a new version and it fails, just kill it right there. Leave the old working version up.
Monoliths have their issues too. But doing microservices correctly is quite the job.

okaleniuk 14 hours ago

Yes, reliability comes from the same ground the scalability does, and yes people are mostly chasing the latest trend. One does not contradict the other.

llm_nerd 14 hours ago

>yes people are mostly chasing the latest trend
https://www.youtube.com/watch?v=b2F-DItXtZs
15 years ago people were making the same "chasing trends" complaints. In that case there absolutely were people cargo culting, but to still be whining about this a decade and a half later, when it's quite literally just absolutely basic best practices.

blueflow 14 hours ago

> Here's the thing, though: Almost every choice that leads to scalability also leads to reliability.

How is that supposed to happen. Without k8 involved somehow?

97nomad 14 hours ago

There is a lot of instruments, that don't need k8s to be scalable and reliable. Starting from stateless services and simple load balancers and ending with actor systems like in Erlang or Akka.