Comment by oblio

2 years ago

The problem is that you do want some horizontal scaling regardless, just to avoid SPOFs as much as you can.

If you can't handle 99.97% uptime for data processing then probably there's a larger issue at play.

  • That's about 13.4min/mo of downtime, every month. That seems likely to cause all kinds of havoc at scale.

    • Maybe we're talking about different things then.

      My laptop is a SPoF in exactly the same way.

      If my laptop is closed then data collection will still happen, as collection and processing are different systems; but my ability to mutate the data hands-on is affected.

      Thus any downtime of my laptop is not really a problem.

      See also: Jupyter notebooks, Excel, etc;

      I will also point out that robustness in distributed systems is not as cut and dry for two reasons:

      1: These are not considered hot-path systems that are mission critical so will be neglected by SRE.

      2: Complexity is increased in distributed systems, thus you have more likelihood of failure until you have a lot of effort put into it.

      10 replies →

    • We are talking about data processing, not a publicly available service. When is 13 min/month of downtime for processing of data a problem?

If you just need high availability, then you don't need to scale horizontally, you just need redundant systems.