Comment by throwdbaaway

3 years ago

> not to mention nearly 50% of every packet was literally packet headers

I was just looking at a similar issue with grpc-go, where it would somehow send a HEADERS frame, a DATA frame, and a terminal HEADERS frame in 3 different packets. The grpc server is a golang binary (lightstep collector), which definitely disables Nagle's algorithm as shown by strace output, and the flag can't be flipped back via the LD_PRELOAD trick (e.g. with a flipped version of https://github.com/sschroe/libnodelay) as the binary is statically linked.

I can't reproduce this with a dummy grpc-go server, where all 3 frames would be sent in the same packet. So I can't blame Nagle's algorithm, but I am still not sure why the lightstep collector behaves differently.

Found the root cause from https://github.com/grpc/grpc-go/commit/383b1143 (original issue: https://github.com/grpc/grpc-go/issues/75):

    // Note that ServeHTTP uses Go's HTTP/2 server implementation which is
    // totally separate from grpc-go's HTTP/2 server. Performance and
    // features may vary between the two paths.

The lightstep collector serves both gRPC and HTTP traffic on the same port, using the ServeHTTP method from the comment above. Unfortunately, Go's HTTP/2 server doesn't have the improvements mentioned in https://grpc.io/blog/grpc-go-perf-improvements/#reducing-flu.... The frequent flushes mean it can suffer from high latency with Nagle enabled, or from high packet overhead with Nagle disabled.

tl;dr: blame bradfitz instead :)