Comment by ww520
18 days ago
The idea is not too far off. You could compute a hash on an existing data block. Store the hash and data block mapping. Now you can use the hash in anywhere that data block resides, i.e. any duplicate data blocks can use the same hash. That's how storage deduplication works in the nutshell.
Except that there are collisions...
This might be completely naive but can a reversible time component be incorporated into distinguishing two hash calculations? Meaning when unpacked/extrapolated it is a unique signifier but when decomposed it folds back into the standard calculation - is this feasible?
Some hashes do have verification bits, that are used not just to verify intact hash, but one "identical" hash from another. However, they do tend to be slower hashes.
2 replies →
hashes by definition are not reversible. you could store a timestamp together with a hash, and/or you could include a timestamp in the digested content, but the timestamp can’t be part of the hash.
3 replies →
Can use cryptographic hashing.
How does that get around the pigeonhole principle?
I think you'd have to compare the data value before purging, and you can only do the deduplication (purge) if the block is actually the same, otherwise you have to keep the block (you can't replace it with the hash because the hash link in the pool points to different data)
4 replies →