← Back to context

Comment by spyrja

1 day ago

Would it be incorrect to say that most of the bloat relates to historical revisions? If so, maybe an rsync-like behavior starting with the most current version of the files would be the best starting point. (Which is all most people will need anyhow.)

> Would it be incorrect to say that most of the bloat relates to historical revisions?

Based on my experience (YMMV), I think it is incorrect, yes, because any time I've performed a shallow clone of a repository, the saving wasn't as much as one would intuitively imagine (in other words: history is stored very efficiently).

  • Doing a bit of digging seems to confirm that, considering that git actually does remove a lot of redundant files during the garbage collection phase. It does however store complete files (unlike a VCS like mercurial which stores deltas) so nonetheless it still might benefit from a download-the-current-snapshot-first approach.

    • > It does however store complete files (unlike a VCS like mercurial which stores deltas)

      The logical model of git is that it stores complete files. The physical model of git is that these complete files are stored as deltas within pack files (except for new files which haven't been packed yet; by default git automatically packs once there are too many of these loose files, and they're always packed in its network protocol when sending or receiving).

      1 reply →