Comment by ww520

18 days ago

Can use cryptographic hashing.

5 comments

ww520

How does that get around the pigeonhole principle?

I think you'd have to compare the data value before purging, and you can only do the deduplication (purge) if the block is actually the same, otherwise you have to keep the block (you can't replace it with the hash because the hash link in the pool points to different data)

ww520 18 days ago
The hash collision chance is extremely low.
- valenterry 18 days ago
  
  For small amounts of data yeah. With growing data, the chance of a collision grows more than proportional. So in the context of working on storage systems (like s3 or so) that won't work unless customers actually accept the risk of a collission as okay. So for example, when storing media data (movies, photos), I could imagine that, but not for data in general.
  
  2 replies →