Comment by zamalek
3 years ago
From the bottom of the article:
> Most people turn to TCP_NODELAY because of the “200ms” latency you might incur on a connection. Fun fact, this doesn’t come from Nagle’s algorithm, but from Delayed ACKs or Corking. Yet people turn off Nagle’s algorithm … :sigh:
Yeah but Nagle's Algorithm and Delayed ACKs interaction is what causes the 200ms.
Servers tend to enable Nagle's algorithm by default. Clients tend to enabled Delayed ACK by default, and then you get this horrible interaction all because they're trying to be more efficient but stalling eachother.
I think Go's behavior is the right default because you can't control every server. But if Nagle's was off by default on servers then we wouldn't need to disabled Delayed ACKs on clients.
Part of OPs point is 'most clients' do not have an ideal congestionless/lossless network between them and, well, anything.
Why does a congestionless network matter here? Nagle's algorithm aggregates writes together in order to fill up a packet. But you can just do that yourself, and then you're not surprised. I find it very rare that anyone is accidentally sending partially-filled packets; they have some data and they want it to be sent now, and are instead surprised by the fact that it doesn't get sent now because their data doesn't happen to be too large to fit in a single packet. Nobody is reading a file a byte at a time and then passing that 1 byte buffer to Write on a socket. (Except... git-lfs I guess?)
Nagle's algorithm is super weird as it's saying "I'm sure the programmer did this wrong, here, let me fix it." Then the 99.99% of the time when you're not doing it wrong, the latency it introduces is too high for anything realtime. Kind of a weird tradeoff, but I'm sure it made sense to quickly fix broken telnet clients at the time.
8 replies →
Most clients that OP deals with, anyway. If your code runs exclusively in a data center, like the kind I suspect Google has, then the situation is probably reversed.
7 replies →
Clients having delayed acks has a very good reason: those ACKs cost data, and clients tend to have much higher download bandwidth than upload bandwidth. Really, clients should probably be delaying acks and nagling packets, while servers should probably be doing neither.
Clients should not be nagling unless the connection is emitting tiny bytes at high frequency. But that's a very odd thing to do, and in most/all cases there's some reasonable buffering occuring higher up in the stack that the nagle's algorithm will only add overhead to. Making things worse are tcp-within-tcp things like http/2.
Nagle's algorithm works great for things like telnet but should not be applied as a default to general purpose networking.
1 reply →
The size of an ACK is minuscule (40 bytes) compared to any reasonable packet size (usually around 1400 bytes).
In most client situations where you have high down bandwidth, but limited up, that suggests the vast majority of data is heading towards the client, and client isn’t sending much outbound. In which case your client may end up delaying every ACK to maximum timeout, simply because it doesn’t often send reply data in response to a server response.
HTTP is clear example of this. Client issues a request to the server, server replies. Client accepts rely, but never sends any further data to the server. In this case, delaying the client ACK is just a waste of time.
"Be conservative in what you send and liberal in what you accept"
I would cite Postels Law: Nagle's is the "conservative send" side. An ACK is a signal of acceptance, and should be issued more liberally (even though it's also sent, I guess).