← Back to context

Comment by Terretta

3 years ago

Part of OPs point is 'most clients' do not have an ideal congestionless/lossless network between them and, well, anything.

Why does a congestionless network matter here? Nagle's algorithm aggregates writes together in order to fill up a packet. But you can just do that yourself, and then you're not surprised. I find it very rare that anyone is accidentally sending partially-filled packets; they have some data and they want it to be sent now, and are instead surprised by the fact that it doesn't get sent now because their data doesn't happen to be too large to fit in a single packet. Nobody is reading a file a byte at a time and then passing that 1 byte buffer to Write on a socket. (Except... git-lfs I guess?)

Nagle's algorithm is super weird as it's saying "I'm sure the programmer did this wrong, here, let me fix it." Then the 99.99% of the time when you're not doing it wrong, the latency it introduces is too high for anything realtime. Kind of a weird tradeoff, but I'm sure it made sense to quickly fix broken telnet clients at the time.

  • > Nagle's algorithm aggregates writes together in order to fill up a packet.

    Not quite an accurate description of Nagles algorithm. It only aggregates writes together if you already have in-flight data. The second you get back an ACK, the next packet will be sent regardless of how full it is. Equally your first write to the socket will always be sent without delay.

    The case where you want to send many tiny packets with minimal latency doesn’t really make sense for TCP, because eventuality the packet overhead and traffic control algorithms will end up throttling your thought put and latency. Nagle only impact cases where you’re trying to TCP in an almost pathological manner, and elegantly handles that behaviour to minimise overheads, and associated throughput and latency costs.

    If there’s a use case where latency is your absolute top priority, then you should be using UDP, and not TCP. Because TCP will always nobble your latency because it insists on ordered data delivery, and will delay just received packets if they arrive ahead of preceding packets. Only UDP gives you the ability to opt-out of that behaviour, and ensure that data is sent and received as quickly as your network allows, and lets your application decide for itself the handling of missing data.

  • It makes perfect sense if you consider the right abstraction. TCP connections are streams. There are no packets on that abstraction level. You’re not supposed to care about packets. You’re not supposed to know how large a packet even is.

    The default is an efficient stream of bytes that has some trade-off to latency. If you care about latency, then you can set a flag.

    • There is no perfect abstraction. Speed matters. A stream where data is delivered ASAP is better than a stream where the data gets delayed... maybe... because the OS decides you didn't write enough data.

      The default actually violates the abstraction more because now you care how large a packet is, because somehow writing a smaller amount of data causes your latency to spike for some mysterious reason.

      1 reply →

  • Often times when people want to send five structs, they just call send five times. I find delayed acks a lot more weird compared to nagle.

    • In those cases it would be better to call writev() which was designed to coalesce multiple buffers into one write call.

      How it sends the data is however up to the implementation, and whether it delays the last send if the TCP buffer isn't entitrely full I'm not sure - but it doesn't make sense to do so, so I would guess not.

      https://linux.die.net/man/2/writev

  • Nagle's algorithm matters because the abstraction that TCP works on, and which was inherited by BSD Socket interface, is that of emulating a full duplex serial port.

    Compare with OSI stack, where packetization is explicit at all layers and thus it wouldn't have such an issue in the first place.

  • Yeah it seems crazy to have that kind of hack in the entire network stack and on by default just because some interactive remote terminal clients didn't handle that behavior themselves.

Most clients that OP deals with, anyway. If your code runs exclusively in a data center, like the kind I suspect Google has, then the situation is probably reversed.

  • Consider the rising of mobile device. The devices that don't have a good internet is probably everywhere now.

    It's no longer like 10 years ago. You either have good internet or don't have internet. The devices that have shitty network grow a lot compare to the past.

    • Almost every application I've written atop a TCP socket batches up writes into a buffer and then flushes out the buffer. I'd be curious to see how often this doesn't happen.

      3 replies →

  • If you run all of your code in one datacenter, and it never talks to the outside world, sure. That is a fairly rare usage pattern for production systems at Google, though.

    Just like anyone else, we have packet drops and congestion within our backbone. We like to tell ourselves that the above is less frequent in our network than the wider internet, but it still exists.

    • If your DC-DC links are regularly as noisy as shitty apartment WiFi routers competing for air time on a narrow band, fix your DC links.