Comment by hedora
1 year ago
Durability levels that poor aren’t state of the art any more.
The rule of thumb I’ve seen at most places (running at similar scale) is to target one data loss, fleet wide, per century.
That usually increases costs by << 10%, but you have to have someone that understands combinatorics design your data placement algorithms.
The copyset paper is a good place to start if you need to understand that stuff.
That sounds a lot better than some number of nines standing by itself.
99.(9)x percent durability is almost meaningless without a description of what the unit of data is, what a loss looks like. There's too many orders of magnitude between a chunky file having an error, a transaction having an error, a block having an error, a bit having an error...