← Back to context

Comment by Ericson2314

1 day ago

The Nixpkgs example is not like the others, because it is source code.

I don't get what is so bad about shallow clones either. Why should they be so performance sensative?

It also seems like it's not git that's emitting scary creaks and groans, but rather GitHub. As much as it would be a bummer to forgo some of GitHub's nice-to-have features, I expect we could survive without some of it.

  • Exactly. Gentoo's main package repo is hosted in Git (but not GitHub, except as a mirror). Now, most users fetch it via rsync, but actually using the Git repo IME makes syncing faster, not slower. Though it does make the initial fetch slower.

  • Furthermore, the issues given for nixpkgs are actually demonstrating the success of using git as the database! Those 20k forks are all people maintaining their own version of nixpkgs on Github, right? Each their own independent tree that users can just go ahead and modify for their own whims and purposes, without having to overcome the activation energy of creating their own package repository.

    If 83GB (4MB/fork) is "too big" then responsibility for that rests solely on the elective centralization encouraged by Github. I suspect if you could go and total up the cumulative storage used by the nixpkgs source tree distributed on computers spread throughout the world, that is many orders of magnitude larger.

    • Agreed, nix really makes it easy to go from solving the problem for yourself to solving it for everybody. Not much else is easy, but when it comes to building an open source community, that criterion is a pretty powerful one.

Shallow clones themselves aren’t the issue. It’s that updating shallow clones requires the server to spend a bunch of CPU time and GitHub simply isn’t willing to provide that for free.

The solution is simple: using a shallow clone means that the use case doesn’t care about the history at all, so download a tarball of the repo for the initial download and then later rsync the repo. Git can remain the source of truth for all history, but that history doesn’t have to be exposed.

In a compressed format, later commits would be added as a delta of some kind, to avoid increasing the size by the whole tree size each time. To make shallow clones efficient you'd need to rewrite the compressed form such that earlier commits are instead deltas on later ones, or something equivalent.