Comment by valenterry

18 days ago

For small amounts of data yeah. With growing data, the chance of a collision grows more than proportional. So in the context of working on storage systems (like s3 or so) that won't work unless customers actually accept the risk of a collission as okay. So for example, when storing media data (movies, photos), I could imagine that, but not for data in general.

Cryptographic hashing collisions are very very small, like end of universe in numerous times small. They're smaller than AWS being burnt down and all backups were lost leading to data loss.

  • You have a point.

    When using MD5 (128bit) then when AWS S3 would apply this technique, it would only get a handful of collisions. Using 256bit would drive that down to a level where any collision is very unlikely.

    This would be worth it if a 4kb block is, on average, duplicated with a chance of at least 6.25%. (not considering overhead of data-structures etc.)