Comment by jrockway
3 years ago
Why does a congestionless network matter here? Nagle's algorithm aggregates writes together in order to fill up a packet. But you can just do that yourself, and then you're not surprised. I find it very rare that anyone is accidentally sending partially-filled packets; they have some data and they want it to be sent now, and are instead surprised by the fact that it doesn't get sent now because their data doesn't happen to be too large to fit in a single packet. Nobody is reading a file a byte at a time and then passing that 1 byte buffer to Write on a socket. (Except... git-lfs I guess?)
Nagle's algorithm is super weird as it's saying "I'm sure the programmer did this wrong, here, let me fix it." Then the 99.99% of the time when you're not doing it wrong, the latency it introduces is too high for anything realtime. Kind of a weird tradeoff, but I'm sure it made sense to quickly fix broken telnet clients at the time.
> Nagle's algorithm aggregates writes together in order to fill up a packet.
Not quite an accurate description of Nagles algorithm. It only aggregates writes together if you already have in-flight data. The second you get back an ACK, the next packet will be sent regardless of how full it is. Equally your first write to the socket will always be sent without delay.
The case where you want to send many tiny packets with minimal latency doesn’t really make sense for TCP, because eventuality the packet overhead and traffic control algorithms will end up throttling your thought put and latency. Nagle only impact cases where you’re trying to TCP in an almost pathological manner, and elegantly handles that behaviour to minimise overheads, and associated throughput and latency costs.
If there’s a use case where latency is your absolute top priority, then you should be using UDP, and not TCP. Because TCP will always nobble your latency because it insists on ordered data delivery, and will delay just received packets if they arrive ahead of preceding packets. Only UDP gives you the ability to opt-out of that behaviour, and ensure that data is sent and received as quickly as your network allows, and lets your application decide for itself the handling of missing data.
It makes perfect sense if you consider the right abstraction. TCP connections are streams. There are no packets on that abstraction level. You’re not supposed to care about packets. You’re not supposed to know how large a packet even is.
The default is an efficient stream of bytes that has some trade-off to latency. If you care about latency, then you can set a flag.
There is no perfect abstraction. Speed matters. A stream where data is delivered ASAP is better than a stream where the data gets delayed... maybe... because the OS decides you didn't write enough data.
The default actually violates the abstraction more because now you care how large a packet is, because somehow writing a smaller amount of data causes your latency to spike for some mysterious reason.
> A stream where data is delivered ASAP is better than a stream where the data gets delayed
That depends on your situation, because as you say no abstraction is perfect. Having a stream delivered “faster” isn’t helpful if means your overhead makes up 50% of your traffic, exactly what nagle avoids.
Nagles algorithm is also pretty smart, it’s only going to delay your next packet until it’s either full, or the far end has acknowledged your preceding packet. If your got a crap ton of data to send, and you’re dumping straight into the TCP buffer, then Nagle won’t delay anything because there’s enough data to fill packets. Nagle only kicks in if you’re doing many frequent tiny writes to a TCP connection, which is rarely a valid thing to do if you care about latency and throughput, so Nagles algorithm assuming the dev has made a mistake is reasonable.
If you really care about stream latency, then UDP is your friend. Then you can completely dispense with all the traffic control processes in TCP and have stuff sent exactly when you want it sent.
Often times when people want to send five structs, they just call send five times. I find delayed acks a lot more weird compared to nagle.
In those cases it would be better to call writev() which was designed to coalesce multiple buffers into one write call.
How it sends the data is however up to the implementation, and whether it delays the last send if the TCP buffer isn't entitrely full I'm not sure - but it doesn't make sense to do so, so I would guess not.
https://linux.die.net/man/2/writev
Nagle's algorithm matters because the abstraction that TCP works on, and which was inherited by BSD Socket interface, is that of emulating a full duplex serial port.
Compare with OSI stack, where packetization is explicit at all layers and thus it wouldn't have such an issue in the first place.
Yeah it seems crazy to have that kind of hack in the entire network stack and on by default just because some interactive remote terminal clients didn't handle that behavior themselves.