Comment by Philip-J-Fry
3 years ago
Yeah but Nagle's Algorithm and Delayed ACKs interaction is what causes the 200ms.
Servers tend to enable Nagle's algorithm by default. Clients tend to enabled Delayed ACK by default, and then you get this horrible interaction all because they're trying to be more efficient but stalling eachother.
I think Go's behavior is the right default because you can't control every server. But if Nagle's was off by default on servers then we wouldn't need to disabled Delayed ACKs on clients.
Part of OPs point is 'most clients' do not have an ideal congestionless/lossless network between them and, well, anything.
Why does a congestionless network matter here? Nagle's algorithm aggregates writes together in order to fill up a packet. But you can just do that yourself, and then you're not surprised. I find it very rare that anyone is accidentally sending partially-filled packets; they have some data and they want it to be sent now, and are instead surprised by the fact that it doesn't get sent now because their data doesn't happen to be too large to fit in a single packet. Nobody is reading a file a byte at a time and then passing that 1 byte buffer to Write on a socket. (Except... git-lfs I guess?)
Nagle's algorithm is super weird as it's saying "I'm sure the programmer did this wrong, here, let me fix it." Then the 99.99% of the time when you're not doing it wrong, the latency it introduces is too high for anything realtime. Kind of a weird tradeoff, but I'm sure it made sense to quickly fix broken telnet clients at the time.
> Nagle's algorithm aggregates writes together in order to fill up a packet.
Not quite an accurate description of Nagles algorithm. It only aggregates writes together if you already have in-flight data. The second you get back an ACK, the next packet will be sent regardless of how full it is. Equally your first write to the socket will always be sent without delay.
The case where you want to send many tiny packets with minimal latency doesn’t really make sense for TCP, because eventuality the packet overhead and traffic control algorithms will end up throttling your thought put and latency. Nagle only impact cases where you’re trying to TCP in an almost pathological manner, and elegantly handles that behaviour to minimise overheads, and associated throughput and latency costs.
If there’s a use case where latency is your absolute top priority, then you should be using UDP, and not TCP. Because TCP will always nobble your latency because it insists on ordered data delivery, and will delay just received packets if they arrive ahead of preceding packets. Only UDP gives you the ability to opt-out of that behaviour, and ensure that data is sent and received as quickly as your network allows, and lets your application decide for itself the handling of missing data.
It makes perfect sense if you consider the right abstraction. TCP connections are streams. There are no packets on that abstraction level. You’re not supposed to care about packets. You’re not supposed to know how large a packet even is.
The default is an efficient stream of bytes that has some trade-off to latency. If you care about latency, then you can set a flag.
2 replies →
Often times when people want to send five structs, they just call send five times. I find delayed acks a lot more weird compared to nagle.
1 reply →
Nagle's algorithm matters because the abstraction that TCP works on, and which was inherited by BSD Socket interface, is that of emulating a full duplex serial port.
Compare with OSI stack, where packetization is explicit at all layers and thus it wouldn't have such an issue in the first place.
Yeah it seems crazy to have that kind of hack in the entire network stack and on by default just because some interactive remote terminal clients didn't handle that behavior themselves.
Most clients that OP deals with, anyway. If your code runs exclusively in a data center, like the kind I suspect Google has, then the situation is probably reversed.
Consider the rising of mobile device. The devices that don't have a good internet is probably everywhere now.
It's no longer like 10 years ago. You either have good internet or don't have internet. The devices that have shitty network grow a lot compare to the past.
4 replies →
If you run all of your code in one datacenter, and it never talks to the outside world, sure. That is a fairly rare usage pattern for production systems at Google, though.
Just like anyone else, we have packet drops and congestion within our backbone. We like to tell ourselves that the above is less frequent in our network than the wider internet, but it still exists.
1 reply →
Clients having delayed acks has a very good reason: those ACKs cost data, and clients tend to have much higher download bandwidth than upload bandwidth. Really, clients should probably be delaying acks and nagling packets, while servers should probably be doing neither.
Clients should not be nagling unless the connection is emitting tiny bytes at high frequency. But that's a very odd thing to do, and in most/all cases there's some reasonable buffering occuring higher up in the stack that the nagle's algorithm will only add overhead to. Making things worse are tcp-within-tcp things like http/2.
Nagle's algorithm works great for things like telnet but should not be applied as a default to general purpose networking.
Why would Nagles algorithm add delay to “reasonable buffering up the stack”? Assuming that buffering is resulting in writes to the TCP stack greater than the packet size, Nagles algorithm won’t add any delay.
The only place where Nagles algorithm adds delay is when your doing many tiny writes to a socket, which is exactly the situation you believe Nagles should be applied to.
The size of an ACK is minuscule (40 bytes) compared to any reasonable packet size (usually around 1400 bytes).
In most client situations where you have high down bandwidth, but limited up, that suggests the vast majority of data is heading towards the client, and client isn’t sending much outbound. In which case your client may end up delaying every ACK to maximum timeout, simply because it doesn’t often send reply data in response to a server response.
HTTP is clear example of this. Client issues a request to the server, server replies. Client accepts rely, but never sends any further data to the server. In this case, delaying the client ACK is just a waste of time.
"Be conservative in what you send and liberal in what you accept"
I would cite Postels Law: Nagle's is the "conservative send" side. An ACK is a signal of acceptance, and should be issued more liberally (even though it's also sent, I guess).