Comment by ww520

7 months ago

The idea is not too far off. You could compute a hash on an existing data block. Store the hash and data block mapping. Now you can use the hash in anywhere that data block resides, i.e. any duplicate data blocks can use the same hash. That's how storage deduplication works in the nutshell.

15 comments

ww520

valenterry 7 months ago

Except that there are collisions...

datameta 7 months ago
This might be completely naive but can a reversible time component be incorporated into distinguishing two hash calculations? Meaning when unpacked/extrapolated it is a unique signifier but when decomposed it folds back into the standard calculation - is this feasible?
- shakna 7 months ago
  
  Some hashes do have verification bits, that are used not just to verify intact hash, but one "identical" hash from another. However, they do tend to be slower hashes.
  
  2 replies →
- ruined 7 months ago
  
  hashes by definition are not reversible. you could store a timestamp together with a hash, and/or you could include a timestamp in the digested content, but the timestamp can’t be part of the hash.
  
  3 replies →
ww520 7 months ago
Can use cryptographic hashing.
- anonymars 7 months ago
  
  How does that get around the pigeonhole principle?
  I think you'd have to compare the data value before purging, and you can only do the deduplication (purge) if the block is actually the same, otherwise you have to keep the block (you can't replace it with the hash because the hash link in the pool points to different data)
  
  4 replies →