← Back to context

Comment by vbezhenar

9 days ago

You have 100 bytes file. You split it into 10 chunks (data shards) and add 11-th chunk (parity shard) as XOR of all 10 chunks. Now you store every chunk on separate drive. So you have 100 bytes and you spent 110 bytes to store them all. Now you can survive one drive death, because you can recompute any missing chunk as XOR of all chunks.

That's very primitive explanation, but should be easy to understand.

In reality S3 uses different algorithm (probably Reed-Solomon codes) and some undisclosed number of shards (probably different for different storage classes). Some say that they use 5 of 9 (so 5 data shards + 4 parity shards which makes for 80% overhead), but I don't think it's official information.