Comment by Dylan16807

20 hours ago

Can you list some realistic workflows where people would be touching the same huge file but only changing much smaller parts of it?

And yes you can represent a whole repo as a giant tar file, but because the boundaries between hash segments won't line up with your file boundaries you get an efficiency hit with very little benefit. Unless you make it file-aware in which case it ends up even closer to what git already does.

Git knows how to store deltas between files. Making that mechanism more reliable is probably able to achieve more with less.

3 comments

Dylan16807

bobmcnamara 11 hours ago

Most Microsoft office documents.

One of our projects has a UI editor with a 60MB file for nearly everything except images, and people work on different UI flows at the same time.

Dylan16807 3 hours ago

So for office, you're looking at files that are archive formats already. Maybe you could improve that a bit, but because of the compression you wouldn't be able to diff text edits better, just save storage. So it would perform about the same as git already does. And you could make it smarter so the prolly tree works better, but you could also make git smarter in the same way, it's not a prolly tree specific optimization.
For your UI editor I'd need to understand the format more.

jiggawatts 20 hours ago

Binary database files containing “master data”.

Merging would require support from the DB engine, however.