Comment by oblio
2 years ago
The problem is that you do want some horizontal scaling regardless, just to avoid SPOFs as much as you can.
2 years ago
The problem is that you do want some horizontal scaling regardless, just to avoid SPOFs as much as you can.
If you can't handle 99.97% uptime for data processing then probably there's a larger issue at play.
That's about 13.4min/mo of downtime, every month. That seems likely to cause all kinds of havoc at scale.
Maybe we're talking about different things then.
My laptop is a SPoF in exactly the same way.
If my laptop is closed then data collection will still happen, as collection and processing are different systems; but my ability to mutate the data hands-on is affected.
Thus any downtime of my laptop is not really a problem.
See also: Jupyter notebooks, Excel, etc;
I will also point out that robustness in distributed systems is not as cut and dry for two reasons:
1: These are not considered hot-path systems that are mission critical so will be neglected by SRE.
2: Complexity is increased in distributed systems, thus you have more likelihood of failure until you have a lot of effort put into it.
10 replies →
We are talking about data processing, not a publicly available service. When is 13 min/month of downtime for processing of data a problem?
If you just need high availability, then you don't need to scale horizontally, you just need redundant systems.