Comment by mmh0000
3 days ago
> I chose managed services specifically to avoid ops emergencies
You may not be spending enough time on HN reading all the horror stories =P
The benefit of a managed service isn't that it doesn't go down; though it probably goes down less than something you self-manage, unless you're a full-time SRE with the experience to back it.
The benefit of a managed service is you say: "It's not my problem, I opened a ticket, now I'm going to get lunch, hope it's back up soon."
> though it probably goes down less than something you self-manage, unless you're a full-time SRE with the experience to back it.
I wonder how true that is. This went down because of a bad update, which is probably like 99.99% of outages. The other 0.01% is cosmic rays causing hardware failures.
My server was up for 3.5 years with no outages because I just didn't touch it. I had to take it offline a couple days ago to move it which made me a little sad. Took a snapshot and moved it to a new droplet, brought it back up as-is and it's running great again.
Anyway, emergencies are less emergy if things go down while you're upgrading and shuffling things around yourself. You expect hiccups if you're the one causing the hiccups. It's when someone else is tinkering on the other side of the country/planet and blows something up that suddenly you have an emergency.
I concur. I've seen a lot of companies outside the techbro world where the entire thing runs on a single VPS/dedicated server with a setup that would make any sysadmin squirm. And yet, it just works and makes them money?
Which isn't too surprising - hardware is extremely reliable nowadays. When's the last time your laptop broke? And that laptop lives a much harsher life than server HW in a datacenter. Obviously everyone is going to have their own anecdotes about this, but I think it's fair to say that overall the failure rates are quite low.
You know why their (often awful) setups work and consistently beat the major clouds in terms of uptime? No moving parts for K8s and all the "best practices", and most importantly, there is nobody "fixing" the working setup until it doesn't work. Ironically they are getting better uptime by avoiding all the things that are marketed as improving uptime.
I've read a few horror stories, but I always thought it wouldn't happen to me :)
> It's not my problem, I opened a ticket, now I'm going to get lunch, hope it's back up soon.
That's a good way of thinking about it.