Comment by throwdbaaway
3 years ago
> not to mention nearly 50% of every packet was literally packet headers
I was just looking at a similar issue with grpc-go, where it would somehow send a HEADERS frame, a DATA frame, and a terminal HEADERS frame in 3 different packets. The grpc server is a golang binary (lightstep collector), which definitely disables Nagle's algorithm as shown by strace output, and the flag can't be flipped back via the LD_PRELOAD trick (e.g. with a flipped version of https://github.com/sschroe/libnodelay) as the binary is statically linked.
I can't reproduce this with a dummy grpc-go server, where all 3 frames would be sent in the same packet. So I can't blame Nagle's algorithm, but I am still not sure why the lightstep collector behaves differently.
Found the root cause from https://github.com/grpc/grpc-go/commit/383b1143 (original issue: https://github.com/grpc/grpc-go/issues/75):
The lightstep collector serves both gRPC and HTTP traffic on the same port, using the ServeHTTP method from the comment above. Unfortunately, Go's HTTP/2 server doesn't have the improvements mentioned in https://grpc.io/blog/grpc-go-perf-improvements/#reducing-flu.... The frequent flushes mean it can suffer from high latency with Nagle enabled, or from high packet overhead with Nagle disabled.
tl;dr: blame bradfitz instead :)