← Back to context

Comment by benmmurphy

3 years ago

when using TCP_NODELAY do you need to ensure your writes are a multiple of the maximum segment size? for example if the MSS is 1400 and you are doing writes of 1500 bytes does this mean you will be sending packets of size 1400 and 100?

What about if there are jumbo frames all the way to the client. You are throwing away a lot of bandwidth. What about if there is vxlan like in k8s, you’ll be sending two packets, one tiny and one full. Use Nagle and send what you have when you have it. Let the TCP stack do it’s job. Work on optimization when it is actually impactful to do so. Sending a packet is cheaper than reading a db.

  • the big reason for no-delay is the really bad interaction between nagle's algorithm and delayed ACK for request-response protocols like the start of a TLS connection. its possible the second handshake packet the client/server sends to be delayed significantly because one of the parties has delayed ack enabled.

    Ideally, the application could just signal to the OS that the data needs to be flushed at a certain points. TCP_NODELAY almost lets you do this but the problem is it applies to all writes() including ones that don't need to be flushed. for example if you are a http server sending a 250MB response then only the last write needs to be 'flushed'. linux has some non-posix options that you give more control like TCP_CORK using setsockopt which lets you signal these boundaries explicitly or MSG_MORE which is a bit more convenient to use.