← Back to context

Comment by dgellow

2 days ago

Perforce is used in game dev, animation, etc. git is pretty poor at dealing with lots of really large assets

I've heard this about game dev before. My (probably only somewhat correct) understanding is it's more than just source code--are they checking in assets/textures etc? Is perforce more appropriate for this than, say, git lfs?

  • I'm not sure about the current state of affairs, but I've been told that git-lfs performance was still not on par with Perforce on those kinds of repos a few years ago. Microsoft was investing a lot of effort in making it work for their large repos though so maybe it's different now.

    But yeah, it's basically all about having binaries in source control. It's not just game dev, either - hardware folk also like this for their artifacts.

  • Assets, textures, design documents, tools, binary dependencies, etc…

    And yes, p4 just rolls with it, git lfs is a creacky hack.

  • And often binaries: .exe, .dll, even .pdb files.

    • Interesting. Seems antithetical to the 'git centered' view of being for source code only (mostly)

      I think I read somewhere that game dev teams would also check in the actual compiler binary and things of that nature into version control.

      Usually it's considered "bad practice" when you see, like, and entire sysroot of shared libs in a git repository.

      I don't even have any feeling one way or another. Even today "vendoring" cpp libraries (typically as source) isn't exactly rare. I'm not even sure if this is always a "bad" thing in other languages. Everyone just seems to have decided that relying on a/the package manager and some sort of external store is the Right Way. In some sense, it's harder to make the case for that.

      5 replies →

why is this still the case ?

  • I've been checking in large (10s to 100s MBs) tarballs into one git repo that I use for managing a website archive for a few years, and it can be made to work but it's very painful.

    I think there are three main issues:

    1. Since it's a distributed VCS, everyone must have a whole copy of the entire repo. But that means anyone cloning the repo or pulling significant commits is going to end up downloading vast amounts of binaries. If you can directly copy the .git dir to the other machine first instead of using git's normal cloning mechanism then it's not as bad, but you're still fundamentally copying everything:

      $ du -sh .git
      55G .git
    

    2. git doesn't "know" that something is a binary (although it seems to in some circumstances), so some common operations try to search them or operate on them in other ways as if they were text. (I just ran git log -S on that repo and git ran out of memory and crashed, on a machine with 64GB of RAM).

    3. The cure for this (git lfs) is worse than the disease. LFS is so bad/strange that I stopped using it and went back to putting the tarballs in git.

    • This is a problem that occurs across game development to ML datasets.

      We built oxen to solve this problem https://github.com/Oxen-AI/Oxen (I work at Oxen.ai)

      Source control for large data. Currently our biggest repository is 17 TB. would love for you to try it out. It's open source, so you can self host as well.

    • Why would someone check binaries in a repo? The only time I came across checked binaries in a repo was because that particular dev could not be bothered to learn nuget / MAVEN. (the dev that approved that PR did not understand that either)

      7 replies →