← Back to context

Comment by pizza234

2 days ago

> Would it be incorrect to say that most of the bloat relates to historical revisions?

Based on my experience (YMMV), I think it is incorrect, yes, because any time I've performed a shallow clone of a repository, the saving wasn't as much as one would intuitively imagine (in other words: history is stored very efficiently).

Doing a bit of digging seems to confirm that, considering that git actually does remove a lot of redundant files during the garbage collection phase. It does however store complete files (unlike a VCS like mercurial which stores deltas) so nonetheless it still might benefit from a download-the-current-snapshot-first approach.

  • > It does however store complete files (unlike a VCS like mercurial which stores deltas)

    The logical model of git is that it stores complete files. The physical model of git is that these complete files are stored as deltas within pack files (except for new files which haven't been packed yet; by default git automatically packs once there are too many of these loose files, and they're always packed in its network protocol when sending or receiving).

    • Yes, the problem really stems from the fact that git "understands" text files but not really anything other than that, so it can't really make a good diff between say a jpeg and its updated version, so it simply relies on compression for those other formats.

      It would be nice to have a VCS that could manage these more effectively but most binary formats don't lend themselves to that, even when it might be an additional layer to an image.

      I reckon there's still room for better image and video formats that would work better with VCS.