Comment by kibwen

1 day ago

I think there's a form of survivorship bias at work here. To use the example of Cargo, if Rust had never caught on, and thereby gotten popular enough to inflate the git-based index beyond reason, then it would never have been a problem to use git as the backing protocol for the index. Likewise, we can imagine innumerable smaller projects that successfully use git as a distributed delta-updating data distribution protocol, and never happen to outgrow it.

The point being, if you're not sure whether your project will ever need to scale, then it may not make sense to reinvent the wheel when git is right there (and then invent the solution for hosting that git repo, when Github is right there), letting you spend time instead on other, more immediate problems.

Right, this post may encourage premature optimization. Cargo, Homebrew, et al chose an easy, good-enough solution which allowed them to grow until they hit scaling limits. This is a good problem to have.

I am sure there's value having a vision for what your scaling path might be in the future, so this discussion is a good one. But it doesn't automatically mean that git is a bad place to start.

im surprised nobody has made a common db for package managers, so cargo could use it without having to think about it

  • I mean, it's sort of the other way around. Cargo was built to be able to natively understand git-based dependencies, in the sense that you can bypass a crate registry and instead just point it directly at a git repo. That means that Cargo already had to have the ability to clone git repos, and so when it came to decide how to implement the index (which looks pretty similar to a git repo if you squint), choosing to use git required them to add literally no new dependencies and almost no new code.

    Let's also keep in mind that the use case mentioned in the OP is specifically about the index, which is just the datastructure that informs the version resolver how to resolve versions. When it came time to replace the git-based index, Cargo didn't replace it with a specialized database, it replaced it with HTTP endpoints (which are probably just backed by an off-the-shelf database). It's not clear what sort of specialized database would be useful to abstract this for other package managers.

Keep in mind that crates.io, the main crate registry, uses GitHub as its only authentication method. They may have moved away from git but they're still locked into a rather piss poor vendor.

  • No, crates.io isn't locked to Github. crates.io uses Github as an identity provider, but there's nothing stopping them from adding more. Furthermore, they've avoided tying themselves to Github in other ways, for example, by resisting all the people just telling them to allow using Github usernames as package namespaces, specifically to prevent them from being locked to Github.