Comment by dimatura
9 hours ago
There's also git-annex and iterative DVC. I used xethub a fair bit (was the earliest user, in fact) and I thought it was better than git annex, git-lfs and DVC, but still did start to struggle past a certain size. I think part of the problem was just git itself, and the compromises required to have a hybrid repo. So I'm happy to see this vcs doesn't use it. xethub did start shipping a version of their product that did not use git but I didn't get the chance to try it. I've also tried oxen and it wasn't bad at first, but soon ran into some weird issues with the repo state which I didn't really try to debug. It is clear to me at this point, given my experience with all these systems -- none of which I've been 100% happy with -- that "git for data" is a nontrivial problem.
Curious if you’ve had a chance to try lakeFS?
It was designed with large-scale environments in mind. I’m aware of several deployments managing hundreds of petabytes of data and billions of objects, which is why lakeFS does not use Git’s Merkle tree / directory tree approach.
Disclaimer: I’m one of the project’s co-creators.
one of the oxen engineers here would love to hear about anything you ran into on the os product or platform! we've grown the team a bunch and are eager to learn what your perfect vcs looks like