← Back to context

Comment by Eikon

7 hours ago

Most of the pain here is the typical set of issues people run into trying to make S3 a filesystem as-is, common with S3FS-family approaches.

ZeroFS (https://github.com/Barre/zerofs) is 9P/NFS/NBD over S3 on an LSM. Point stock go-git, or just /usr/bin/git, at a mount and skip the gymnastics. Rename is a metadata op in the keyspace, so you get it atomic on any S3, no Tigris-specific X-Tigris-Rename needed.

Different point on the spectrum, but less square-peg, also most probably much, much faster (it works great on linux-sized repos) :)

Author of the article here. I'm aware of ZeroFS and other similar approaches (such as something internal at Tigris that will become public at a later date), this was more of an experiment to see how far you can get with stuff I already had "on the shelf". I am going to be improving this a fair bit; I just need to plan out what I'm gonna work on and figure out the best times to stream it, etc.

I wouldn’t call it gymnastics. The surprising part of the article was that Git itself is an object store that happens to use a filesystem for persistence, but an S3 bucket might actually be more suitable than a .git directory on POSIX.

  • It makes sense, similar to how blob storage is a natural fit for a nix cache, where you have a giant flat space of many hash-address immutable directories/archives.

    I think most of the pain documented in the article is just that git-the-implementation contains a lot of assumptions about it being a (local) filesystem that it is operating on, hence stuff like calling stat a ton of times, or doing the rename trick to get atomic behavior from a not-normally-atomic operation (updating a file in place).

    If it were possible to define a "backing storage" API layer within git, it might be possible to move all the filesystem/posix-centric stuff to the other side of it and leave behind an interface that maps quite nicely to blob storage.