← Back to context

Comment by littlestymaar

5 hours ago

In many companies I worked for, there were a bunch of infrastructure astronauts who made everything very complicated in the name of zero downtime and sold them to management as “downtime would kill pur credibility and our businesses ”, and then you have billion dollar companies everyone relies on (GitHub, Cloudflare) who have repeated downtime yet it doesn't seem to affect their business in any way.

It's a multitude of factors but basically they can act like that because they are dominant on the market.

The classic "nobody ever gets fired for buying IBM".

If you pick something else, and there's issue, people will complain about your choice being wrong, should have gone with the biggest player.

Even if you provide metrics showing your solution's downtime being 1% of the big player.

Something like Cloudflare is so big and ubiquitous, that, when there's a downtime, even your grandma is aware of it because they talk about it in the news. So nobody will put the blame on the person choosing Cloudflare.

Even if people decides to go back (I had a few customers asking us to migrate to other solutions or to build some kind of failover after the last Cloudflare incidents), it costs so much to find the solutions that can replace it with the same service level and to do the migration, that, in the end, they prefer to eat the cost of the downtimes.

Meanwhile, if you're a regular player in a very competitive market, yes, every downtime will result in lost income, customers leaving... which can hurt quite a lot when you don't have hundreds of thousands of customers.

Businesses are incommensurate.

GitHub is a distributed version control storage hub with additional add-on features. If peeps can’t work around a git server/hub being down and don’t know to have independent reproducible builds or integrations and aren’t using project software wildly better that GitHubs’, there are issues. And for how much money? A few hundred per dev per year? Forget total revenue, the billions, the entire thing is a pile of ‘suck it up, buttercup’ with ToS to match.

In contrast, I’ve been working for a private company selling patient-touching healthcare solutions and we all would have committed seppuku with outages like this. Yeah, zero downtime or as close to it as possible even if it means fixing MS bugs before they do. Fines, deaths, and public embarrassment were potential results of downtime.

All investments become smart or dumb depending on context. If management agrees that downtime would be lethal my prejudice would be to believe them since they know the contracts and sales perspective. If ‘they crashed that one time’ stops all sales, the 0% revenue makes being 30% faster than those astronauts irrelevant.

To be fair - it SUPER does. Being down frequently makes your competition look better.

Of course, once you have the momentum it doesn't matter nearly as much, at least for a while. If it happens too much though, people will start looking for alternatives.

The key to remember is Momentum is hard to redirect, but with enough force (reasons), it will.