← Back to context

Comment by temac

2 years ago

Fix the apps. Nobody expect magical perf if you do that when writing to files, even though the OS also has its own buffers. There is no reason to expect otherwise when writing to a socket and actually nagle already doesn't save you from syscall overhead.

Nagle doesn't save the derpy side from syscall overhead, but it would save the other side.

It's not just apps doing this stuff, it also lives in system libraries. I'm still mad at the Android HTTPS library for sending chunked uploads as so many tinygrams. I don't remember exactly, but I think it's reasonable packetization for the data chunk (if it picked a reasonable size anyway), then one packet for \r\n, one for the size, and another for another \r\n. There's no reason for that, but it doesn't hurt the client enough that I can convince them to avoid the system library so they can fix it and the server can manage more throughput. Ugh. (It might be that it's just the TLS packetization that was this bogus and the TCP packetization was fine, it's been a while)

If you take a pcap for some specific issue, there's always so many of these other terrible things in there. </rant>

I agree that such code should be fixed but having hard time persuading developers to fix their code. Many of them don't know what is a syscall, how making a syscall triggers sending of an IP packet, how a library call translates to a syscall e. t. c. Worse they don't want to know this, they write say Java code (or some other high level language) and argue that libraries/JDK/kernel should handle all 'low level' stuff.

To get optimal performance for request-response protocols like HTTP one should send a full request which includes a request line, all headers and a POST body using a single write syscall (unless POST body is large and it make sense to write it in chunks). Unfortunately not all HTTP libraries work this way and a library user cannot fix this problem without switching a library which is: 1. not always easy 2. it is not widely known which libraries are efficient and which are not. Even if you have an own HTTP library it's not always trivial to fix: e. g. in Java a way to fix this problem while keeping code readable and idiomatic is too wrap socket into BufferedOutputStream which adds one more memory-to-memory copy for all data you are sending on top of at least one memory-to-memory copy you already have without a buffered stream; so it's not an obvious performance win for an application which already saturates memory bandwidth.

> Fix the apps. Nobody expect magical perf if you do that when writing to files,

We write to files line-by-line or even character-by-character and expect the library or OS to "magically" buffer it into fast file writes. Same with memory. We expect multiple small mallocs to be smartly coalesced by the platform.

  • If you expect a POSIX-y OS to buffer write(2) calls, you're sadly misguided. Whether or not that happens depends on nature of the device file you're writing to.

    OTOH, if you're using fwrite(3), as you likely should be actual file I/O, then your expectation is entirely reasonable.

    Similarly with memory. If you expect brk(2) to handle multiple small allocations "sensibly" you're going to be disappointed. If you use malloc(3) then your expectation is entirely reasonable.

    • Whether buffering is part of POSIX or not is beside the point. Any modern OS you'll find will buffer write calls in one way or the other. Similarly with memory. Linux waits until accesses page faults before reserving any memory pages for you. My point is that various forms of buffering is everywhere and in practice we do rely on it a whole lot.

      2 replies →

  • True to a degree. But that is a singular platform wholly controlled by the OS.

    Once you put packets out into the world you're in a shared space.

    I assume every conceivable variation of argument has been made both for and against Nagles at this point but it essentially revolves around a shared networking resource and what policy is in place for fair use.

    Nagles fixes a particular case but interferes overall. If you fix the "particular case app" the issue goes away.

  • Yes, your libraries should fix that. The OS (as in the kernel) should not try to do any abstraction.

    Alas, kernels really like to offer abstractions.

Everybody expects magical perf if you do that when writing files. We have RAM buffers and write caches for a reason, even on fast SSDs. We expect it so much that macOS doesn't flush to disk even when you call fsync() (files get flushed to the disk's write buffer instead).

There's some overhead to calling write() in a loop, but it's certainly not as bad as when a call to write() would actually make the data traverse whatever output stream you call it on.

Those are the apps are quickly written and do not care if they unnecessarily congest the network. The ones that do get properly maintained can set TCP_NODELAY. Seems like a reasonable default to me.

We actually have the similar behavior when writing to files: contents are buffered in page cache and are written to disk later in batch, unless user explicitly call "sync".

Apps can always misbehave, you never know what people implement, and you don't always have source code to patch. I don't think the role of the OS is to let the apps do whatever they wish, but it should give the possibility of doing it if it's needed. So I'd rather say, if you know you're properly doing things and you're latency sensitive, just TCP_NODELAY on all your sockets and you're fine, and nobody will blame you about doing it.

I would love to fix the apps, can you point me to the github repo with all the code written the last 30 years so I can get started?