← Back to context

Comment by pclmulqdq

2 years ago

In a world where bandwidth was limited, and the packet size minimum was 64 bytes plus an inter-frame gap (it still is for most Ethernet networks), sending a TCP packet for literally every byte wasted a huge amount of bandwidth. The same goes for sending empty acks.

On the other hand, my general position is: it's not TCP_NODELAY, it's TCP.

I'd just love a protocol that has a built in mechanism for realizing the other side of the pipe disconnected for any reason.

  • That's possible in circuit switched networking with various types of supervision, but packet switched networking has taken over because it's much less expensive to implement.

    Attempts to add connection monitoring usually make things worse --- if you need to reroute a cable, and one or both ends of the cable will detect a cable disconnection and close user sockets, that's not great, now you do a quick change with a small period of data loss but otherwise minor interruption; all of the established connections will be dropped.

  • To re-word everyone else's comments - "Disconnected" is not well-defined in any network.

    • > To re-word everyone else's comments - "Disconnected" is not well-defined in any network.

      Parent said disconnected pipe, not network. It's sufficiently well-definable there.

      11 replies →

  • That's really really hard. For a full, guaranteed way to do this we'd need circuit switching (or circuit switching emulation). It's pretty expensive to do in packet networks - each flow would need to be tracked by each middle box, so a lot more RAM at every hop, and probably a lot more processing power. If we go with circuit establishment, its also kind of expensive and breaks the whole "distributed, decentralized, self-healing network" property of the Internet.

    It's possible to do better than TCP these days, bandwidth is much much less constrained than it was when TCP was designed, but it's still a hard problem to do detection of pipe disconnected for any reason other than timeouts (which we already have).

  • Several of the "reliable UDP" protocols I have worked on in the past have had a heartbeat mechanism that is specifically for detecting this. If you haven't sent a packet down the wire in 10-100 milliseconds, you will send an extra packet just to say you're still there.

    It's very useful to do this in intra-datacenter protocols.

  • These types of keepalives are usually best handled at the application protocol layer where you can design in more knobs and respond in different ways. Otherwise you may see unexpected interactions between different keepalive mechanisms in different parts of the protocol stack.

  • If a socket is closed properly there'll be a FIN and the other side can learn about it by polling the socket.

    If the network connection is lost due to external circumstances (say your modem crashes) then how would that information propagate from the point of failure to the remote end on an idle connection? Either you actively probe (keepalives) and risk false positives or you wait until you hear again from the other side, risking false negatives.

    • It gets even worse - routing changes causing traffic to blackhole would still be undetectable without a timeout mechanism, since probes and responses would be lost.

    • > If the network connection is lost due to external circumstances (say your modem crashes) then how would that information propagate from the point of failure to the remote end on an idle connection?

      Observe the line voltage? If it gets cut then you have a problem...

      > Either you actively probe (keepalives) and risk false positives

      What false positives? Are you thinking there's an adversary on the other side?

      2 replies →

Shouldn't QUIC (https://en.wikipedia.org/wiki/QUIC) solve the TCP issues like latency?

  • As someone who needed high throughput and looked to QUIC because of control of buffers, I recommend against it at this time. It’s got tons of performance problems depending on impl and the API is different.

    I don’t think QUIC is bad, or even overengineered, really. It delivers useful features, in theory, that are quite well designed for the modern web centric world. Instead I got a much larger appreciation for TCP, and how well it works everywhere: on commodity hardware, middleboxes, autotuning, NIC offloading etc etc. Never underestimate battletested tech.

    In that sense, the lack of TCP_NODELAY is an exception to the rule that TCP performs well out of the box (golang is already doing this by default). As such, I think it’s time to change the default. Not using buffers correctly is a programming error, imo, and can be patched.

  • The specific issues that this article discusses (eg Nagle's algorithm) will be present in most packet-switched transport protocols, especially ones that rely on acknowledgements for reliability. The QUIC RFC mentions this: https://datatracker.ietf.org/doc/html/rfc9000#section-13

    Packet overhead, ack frequency, etc are the tip of the iceberg though. QUIC addresses some of the biggest issues with TCP such as head-of-line blocking but still shares the more finicky issues, such as different flow and congestion control algorithms interacting poorly.

  • Quic is mostly used between client and data center, but not between two datacenter computers. TCP is the better choice once inside the datacenter.

    Reasons:

    Security Updates

    Phones run old kernels and new apps. So it makes a lot of sense to put something that needs updated a lot like the network stack into user space, and quic does well here.

    Data center computers run older apps on newer kernels, so it makes sense to put the network stack into the kernel where updates and operational tweaks can happen independent of the app release cycle.

    Encryption Overhead

    The overhead of TLS is not always needed inside a data center, where it is always needed on a phone.

    Head of Line Blocking

    Super important on a throttled or bad phone connection, not a big deal when all of your datacenter servers have 10G connections to everything else.

    In my opinion TCP is a battle hardened technology that just works even when things go bad. That it contains a setting with perhaps a poor default is a small thing in comparison to its good record for stability in most situations. It's also comforting to know I can tweak kernel parameters if I need something special for my particular use case.