Comment by bombcar
3 years ago
> Any technical decisions about how many instances to have and how they should be spread out needs to start as a business decision and end in crisp numbers about recovery point/time objections, and yet somehow that nearly never happens.
Nobody wants to admit that their business or their department actually has a SLA of "as soon as you can, maybe tomorrow, as long as it usually works". So everything is pretend-engineered to be fifteen nines of reliability (when in reality it sometimes explodes because of the "attempts" to make it robust).
Being honest about the actual requirements can be extremely helpful.
> Nobody wants to admit that their business or their department actually has a SLA of "as soon as you can, maybe tomorrow, as long as it usually works". So everything is pretend-engineered to be fifteen nines of reliability (when in reality it sometimes explodes because of the "attempts" to make it robust).
I have yet to see my principal technical frustrations summarized so concisely. This is at the heart of everything.
If the business and the engineers can get over their ridiculous obsession of statistical outcomes and strict determinism, they would be able to arrive at a much more cost effective, simple and human-friendly solution.
The # of businesses that are actually sensitive to >1 minute of annual downtime are already running on top of IBM mainframes and have been for decades. No one's business is as important as the federal reserve or pentagon, but they don't want to admit it to themselves or others.
> The # of businesses that are actually sensitive to >1 minute of annual downtime are already running on top of IBM mainframes and have been for decades.
Is there any?
My bank certainly has way less than 5 9s of availability. It's not a problem at all. Credit/debit card processors seem to stay around 5 nines, and nobody is losing sleep over it. As long as your unavailability isn't all on the Christmas promotion day, I never saw anybody losing any sleep over web-store unavailability. The FED probably doesn't have 5 9's of availability. It's way overkill for a central bank, even if it's one that process online interbank transfers (what the FED doesn't).
The organizations that need more than 5 9's are probably all on the military and science sectors. And those aren't using mainframes, they certainly use good old redundancy of equipment with simple failure modes.