Comment by scaredginger

3 years ago

This is a great point. Why is Git LFS uploading a large file in 50 byte chunks?

Ideally large files would upload in MTU sized packets, which Nagle's algorithm will often give you, otherwise you may have a small amount of additional overhead at the boundary where the larger chunk may not be divisible into MTU sized packets.

Edit: I mostly work in embedded (systems that don't run git-lfs), perhaps my view is isn't sensible here.

  • Dividing packets into MTUs is the job of the tcp stack - or even the driver or NIC in the case of offloads. Userspace software shouldn’t deal with MTUs and always use buffer sizes that make sense for the application - eg 64kB or even more. Otherwise the stack wouldn’t be very efficient with every tiny piece of data causing a syscall and independent processing by the networking stack

    • Right; it sounds to me like the real bug is that git-lfs isn't buffering writes to the network driver. Correct me if I'm wrong but if git-lfs was buffering its writes (or using sendfile) then Nagle's algorithm wouldn't matter.

      1 reply →

I do not know Go. But what if there are so many high level abstractions in the Go language that it operates on streams directly?

  • The standard convention is to slap bufio.Reader/bufio.Writer on streams to make them more performant.

    Though how LFS ends up with ~50 byte chunks is probably something very, very, dumb in the LFS code itself. Better to fix that mistake than to paper over it.

    • bufio is for adding buffering regardless of source/dest. Better in this case is ReaderFrom (which will also be used transparently by io.Copy) to let the socket control the buffering and apply even more optimizations. For something like git-lfs I could expect sendfile to provide a huge improvement, depending on the underlying storage.