Comment by mmh0000

1 month ago

  > I chose managed services specifically to avoid ops emergencies

You may not be spending enough time on HN reading all the horror stories =P

The benefit of a managed service isn't that it doesn't go down; though it probably goes down less than something you self-manage, unless you're a full-time SRE with the experience to back it.

The benefit of a managed service is you say: "It's not my problem, I opened a ticket, now I'm going to get lunch, hope it's back up soon."

4 comments

mmh0000

hdjrudni 1 month ago

> though it probably goes down less than something you self-manage, unless you're a full-time SRE with the experience to back it.

I wonder how true that is. This went down because of a bad update, which is probably like 99.99% of outages. The other 0.01% is cosmic rays causing hardware failures.

My server was up for 3.5 years with no outages because I just didn't touch it. I had to take it offline a couple days ago to move it which made me a little sad. Took a snapshot and moved it to a new droplet, brought it back up as-is and it's running great again.

Anyway, emergencies are less emergy if things go down while you're upgrading and shuffling things around yourself. You expect hiccups if you're the one causing the hiccups. It's when someone else is tinkering on the other side of the country/planet and blows something up that suddenly you have an emergency.

kikimora 24 days ago

>My server was up for 3.5 years with no outages because I just didn't touch it.
Problem #1 keeping OS current. Chances are you run an outdated OS with some RCE vulnerabilities.
Problem #2 setup is hard to scale organizationally. How to give access to the server to other people? How to monitor what they do? How to replicate server setup across teams and keep it in sync? So on and so forth.
In an org. something always change, and you have to touch servers as a result.
Nextgrid 1 month ago

I concur. I've seen a lot of companies outside the techbro world where the entire thing runs on a single VPS/dedicated server with a setup that would make any sysadmin squirm. And yet, it just works and makes them money?
Which isn't too surprising - hardware is extremely reliable nowadays. When's the last time your laptop broke? And that laptop lives a much harsher life than server HW in a datacenter. Obviously everyone is going to have their own anecdotes about this, but I think it's fair to say that overall the failure rates are quite low.
You know why their (often awful) setups work and consistently beat the major clouds in terms of uptime? No moving parts for K8s and all the "best practices", and most importantly, there is nobody "fixing" the working setup until it doesn't work. Ironically they are getting better uptime by avoiding all the things that are marketed as improving uptime.

neilfrndes 1 month ago

I've read a few horror stories, but I always thought it wouldn't happen to me :)

> It's not my problem, I opened a ticket, now I'm going to get lunch, hope it's back up soon.

That's a good way of thinking about it.