← Back to context

Comment by AdamN

2 days ago

I feel like we're well into the longtail now. Are there other SCM systems or is it the end of history for source control and git is the one and done solution?

Mercurial still has some life to it (excluding Meta’s fork of it), jj is slowly gaining, fossil exists.

And afaik P4 still does good business, because DVCS in general and git in particular remain pretty poor at dealing with large binary assets so it’s really not great for e.g. large gamedev. Unity actually purchased PlasticSCM a few years back, and has it as part of their cloud offering.

Google uses its own VCS called Piper which they developed when they outgrew P4.

git by itself is often unsuitable for XL codebases. Facebook, Google, and many other companies / projects had to augment git to make it suitable or go with a custom solution.

AOSP with 50M LoC uses a manifest-based, depth=1 tool called repo to glue together a repository of repositories. If you’re thinking “why not just use git submodules?”, it’s because git submodules has a rough UX and would require so much wrangling that a custom tool is more favorable.

Meta uses a custom VCS. They recently released sapling: https://sapling-scm.com/docs/introduction/

In general, the philosophy of distributed VCS being better than centralized is actually quite questionable. I want to know what my coworkers are up to and what they’re working on to avoid merge conflicts. DVCS without constant out-of-VCS synchronization causes more merge hell. Git’s default packfile settings are nightmarish — most checkouts should be depth==1, and they should be dynamic only when that file is accessed locally. Deeper integrations of VCS with build systems and file systems can make things even better. I think there’s still tons of room for innovation in the VCS space. The domain naturally opposes change because people don’t want to break their core workflows.

  • It's interesting to point out that almost all of Microsoft's "augmentations" to git have been open source and many of them have made it into git upstream already and come "ready to configure" in git today ("conical" sparse checkouts, a lot of steady improvements to sparse checkouts, git commit-graph, subtle and not-so-subtle packfile improvements, reflog improvements, more). A lot of it is opt-in stuff because of backwards compatibility or extra overhead that small/medium-sized repos won't need, but so much of it is there to be used by anyone, not just the big corporations.

    I think it is neat that at least one company with mega-repos is trying to lift all boats, not just their own.

    • Meta and Google both have been using mercurial and they have also been contributing back to upstream mercurial.

  • git submodules have a bad ux but it's certainly not worse than Android's custom tooling. I understand why they did it but in retrospect that seems like an obvious mistake to me.

Perforce is used in game dev, animation, etc. git is pretty poor at dealing with lots of really large assets

  • I've heard this about game dev before. My (probably only somewhat correct) understanding is it's more than just source code--are they checking in assets/textures etc? Is perforce more appropriate for this than, say, git lfs?

    • I'm not sure about the current state of affairs, but I've been told that git-lfs performance was still not on par with Perforce on those kinds of repos a few years ago. Microsoft was investing a lot of effort in making it work for their large repos though so maybe it's different now.

      But yeah, it's basically all about having binaries in source control. It's not just game dev, either - hardware folk also like this for their artifacts.

    • Assets, textures, design documents, tools, binary dependencies, etc…

      And yes, p4 just rolls with it, git lfs is a creacky hack.

  • why is this still the case ?

    • I've been checking in large (10s to 100s MBs) tarballs into one git repo that I use for managing a website archive for a few years, and it can be made to work but it's very painful.

      I think there are three main issues:

      1. Since it's a distributed VCS, everyone must have a whole copy of the entire repo. But that means anyone cloning the repo or pulling significant commits is going to end up downloading vast amounts of binaries. If you can directly copy the .git dir to the other machine first instead of using git's normal cloning mechanism then it's not as bad, but you're still fundamentally copying everything:

        $ du -sh .git
        55G .git
      

      2. git doesn't "know" that something is a binary (although it seems to in some circumstances), so some common operations try to search them or operate on them in other ways as if they were text. (I just ran git log -S on that repo and git ran out of memory and crashed, on a machine with 64GB of RAM).

      3. The cure for this (git lfs) is worse than the disease. LFS is so bad/strange that I stopped using it and went back to putting the tarballs in git.

      10 replies →

There are some other solutions (like jujutsu, which while using git as storage medium, has some differences in the handling of commits). But I do believe we reached a critical point where git is the one stop shop for all the source control needs despite it's flaws/complexity.