Ideally large files would upload in MTU sized packets, which Nagle's algorithm will often give you, otherwise you may have a small amount of additional overhead at the boundary where the larger chunk may not be divisible into MTU sized packets.
Edit: I mostly work in embedded (systems that don't run git-lfs), perhaps my view is isn't sensible here.
Dividing packets into MTUs is the job of the tcp stack - or even the driver or NIC in the case of offloads. Userspace software shouldn’t deal with MTUs and always use buffer sizes that make sense for the application - eg 64kB or even more. Otherwise the stack wouldn’t be very efficient with every tiny piece of data causing a syscall and independent processing by the networking stack
Right; it sounds to me like the real bug is that git-lfs isn't buffering writes to the network driver. Correct me if I'm wrong but if git-lfs was buffering its writes (or using sendfile) then Nagle's algorithm wouldn't matter.
The standard convention is to slap bufio.Reader/bufio.Writer on streams to make them more performant.
Though how LFS ends up with ~50 byte chunks is probably something very, very, dumb in the LFS code itself. Better to fix that mistake than to paper over it.
bufio is for adding buffering regardless of source/dest. Better in this case is ReaderFrom (which will also be used transparently by io.Copy) to let the socket control the buffering and apply even more optimizations. For something like git-lfs I could expect sendfile to provide a huge improvement, depending on the underlying storage.
Ideally large files would upload in MTU sized packets, which Nagle's algorithm will often give you, otherwise you may have a small amount of additional overhead at the boundary where the larger chunk may not be divisible into MTU sized packets.
Edit: I mostly work in embedded (systems that don't run git-lfs), perhaps my view is isn't sensible here.
Dividing packets into MTUs is the job of the tcp stack - or even the driver or NIC in the case of offloads. Userspace software shouldn’t deal with MTUs and always use buffer sizes that make sense for the application - eg 64kB or even more. Otherwise the stack wouldn’t be very efficient with every tiny piece of data causing a syscall and independent processing by the networking stack
Right; it sounds to me like the real bug is that git-lfs isn't buffering writes to the network driver. Correct me if I'm wrong but if git-lfs was buffering its writes (or using sendfile) then Nagle's algorithm wouldn't matter.
1 reply →
I do not know Go. But what if there are so many high level abstractions in the Go language that it operates on streams directly?
The standard convention is to slap bufio.Reader/bufio.Writer on streams to make them more performant.
Though how LFS ends up with ~50 byte chunks is probably something very, very, dumb in the LFS code itself. Better to fix that mistake than to paper over it.
bufio is for adding buffering regardless of source/dest. Better in this case is ReaderFrom (which will also be used transparently by io.Copy) to let the socket control the buffering and apply even more optimizations. For something like git-lfs I could expect sendfile to provide a huge improvement, depending on the underlying storage.