Comment by motorest

1 month ago

> My experience after 20 years in the hosting industry is that customers in general have more downtime due to self-inflicted over-engineered replication, or split brain errors than actual hardware failures.

I think you misread OP. "Single point of failure" doesn't mean the only failure modes are hardware failures. It means that if something happens to your nodes whether it's hardware failure or power outage or someone stumbling on your power/network cable, or even having a single service crashing, this means you have a major outage on your hands.

These types of outages are trivially avoided with a basic understanding of well-architected frameworks, which explicitly address the risk represented by single points of failure.

12 comments

motorest

fogx 1 month ago

don't you think it's highly unlikely that someone will stumble over the power cable in a hosted datacenter like hetzner? and even if, you could just run a provisioned secondary server that jumps in if the first becomes unavailable and still be much cheaper.

motorest 1 month ago
> don't you think it's highly unlikely that someone will stumble over the power cable in a hosted datacenter like hetzner?
You're not getting the point. The point is that if you use a single node to host your whole web app, you are creating a system where many failure modes, which otherwise could not even be an issue, can easily trigger high-severity outages.
> and even if, you could just run a provisioned secondary server (...)
Congratulations, you are no longer using "one big server", thus defeating the whole purpose behind this approach and learning the lesson that everyone doing cloud engineering work is already well aware.
- juped 1 month ago
  
  Do you actually think dead simple failover is comparable to elastic kubernetes whatever?
  
  3 replies →
toast0 1 month ago
I don't know about Hetzner, but the failure case isn't usually tripping over power plugs. It's putting a longer server in the rack above/below yours and pushing the power plug out of the back of your server.
Either way, stuff happens, figuring out what your actual requirements around uptime, time to response, and time to resolution is important before you build a nine nines solution when eight eights is sufficient. :p
- kapone 1 month ago
  
  > It's putting a longer server in the rack above/below yours and pushing the power plug out of the back of your server
  Are you serious? Have you ever built/operated/wired rack scale equipment? You think the power cables for your "short" server (vs the longer one being put in) are just hanging out in the back of the rack?
  Rack wiring has been done and done correctly for ages. Power cables on one side (if possible), data and other cables on the other side. These are all routed vertically and horizontally, so they land only on YOUR server.
  You could put a Mercedes Maybach above/below your server and nothing would happen.
  
  3 replies →
icedchai 1 month ago

It's unlikely, but it happens. In the mid 2000's I had some servers at a colo. They were doing electrical work and took out power to a bunch of racks, including ours. Those environments are not static.