Comment by penciltwirler
8 hours ago
The premise is that Git-LFS sucks, so we need to build a new data versioning system (in Rust, from scratch). While I mostly agree with this premise, but there are already lots of existing (mature) data versioning systems with the same tricks under the hood:
- Pachyderm (Go): https://github.com/pachyderm/pachyderm
- XetHub (acquired by HuggingFace): https://huggingface.co/blog/xethub-joins-hf
- LakeFS (Go): https://github.com/treeverse/lakeFS
- Oxen (Rust): https://github.com/Oxen-AI/Oxen
I guess with AI, anyone can vibe code a content-addressed, chunk-level deduped, versioning system in Rust these days...
But jokes aside, Lore seems really cool! What's interesting is the realization that different domains/industries have similar problems, but they don't seem to be cross-polinating. In this case AI and Gaming both need a storage system that can version control large binary files at scale. I think there's lots of opportunities to share ideas here, but perhaps the lack of idea sharing (currently) creates opportunity!
I don't think the needs are exactly the same. I believe in AI the big binary files are normally written once, while in gamedev, they are constantly updated.
That already warrants different storage architectures.
There's also git-annex and iterative DVC. I used xethub a fair bit (was the earliest user, in fact) and I thought it was better than git annex, git-lfs and DVC, but still did start to struggle past a certain size. I think part of the problem was just git itself, and the compromises required to have a hybrid repo. So I'm happy to see this vcs doesn't use it. xethub did start shipping a version of their product that did not use git but I didn't get the chance to try it. I've also tried oxen and it wasn't bad at first, but soon ran into some weird issues with the repo state which I didn't really try to debug. It is clear to me at this point, given my experience with all these systems -- none of which I've been 100% happy with -- that "git for data" is a nontrivial problem.
[dead]