← Back to context

Comment by ks2048

1 day ago

I'm not sure binary diffs are the problem - e.g. for storing images or MP3s, binary diffs are usually worse than nothing.

I would think that git would need a parallel storage scheme for binaries. Something that does binary chunking and deduplication between revisions, but keeps the same merkle referencing scheme as everything else.

  • > binary chunking and deduplication

    Are there many binaries that people would store in git where this would actually help? I assume most files end up with compression or some other form of randomization between revisions making deduplication futile.

> for storing images or MP3s, binary diffs are usually worse than nothing

Editing the ID3 tag of an MP3 file or changing the rating metadata of an image will give a big advantage to block level deduplication. Only a few such cases are needed to more than compensate for that worse than nothing inefficiencies of binary diffs when there's nothing to deduplicate.