← Back to context

Comment by vbezhenar

9 days ago

S3 does not spend 3x drives to provide redundancy. Probably 20% more drives or something like that. They split data to chunks and use erasure coding to store them in multiple drives with little overhead.

AFAIK geo-replication between regions _does_ replicate the entire dataset. It sounds like you're describing RAID configurations, which are common ways to provide redundancy and increased performance within a given disk array. They definitely do that too, but within a zone

wait, can you elaborate on how this works?

  • You have 100 bytes file. You split it into 10 chunks (data shards) and add 11-th chunk (parity shard) as XOR of all 10 chunks. Now you store every chunk on separate drive. So you have 100 bytes and you spent 110 bytes to store them all. Now you can survive one drive death, because you can recompute any missing chunk as XOR of all chunks.

    That's very primitive explanation, but should be easy to understand.

    In reality S3 uses different algorithm (probably Reed-Solomon codes) and some undisclosed number of shards (probably different for different storage classes). Some say that they use 5 of 9 (so 5 data shards + 4 parity shards which makes for 80% overhead), but I don't think it's official information.