Comment by morning-coffee

3 days ago

That's just one bottleneck. The other issue is head-of-line blocking. When there is packet loss on a TCP connection, nothing sent after that is delivered until the loss is repaired.

Whats the packet loss rate on modern networks ? Curious.

  • … from 0% (a wired home LAN with nothing screwy going on) to 100% (e.g., cell reception at the San Antonio Caltrain station), depending on conditions…?

    As it always has been, and always will be.

  • That depends on how much data you are pushing. if you are pushing 200 mb on a 100mb line you will get 50% packet loss.

    • Well, yes, that's the idea behind TCP itself, but a "normal" rate of packet loss is something along the lines of 5/100k packets dropped on any given long-haul link. Let's say a random packet passes about 8 such links, so a "normal" rate of packet loss is 0.025% or so.

      1 reply →

TCP windowing fixes the issue you are describing. Make the window big and TCP will keep sending when there is a packet loss. It will also retry and usually recover before the end of the window is reached.

https://en.wikipedia.org/wiki/TCP_window_scale_option

  • The statement in the comment you're replying to is still true. While waiting for those missed packets, the later packets will not be dropped if you have a large window size. But they won't be delivered either. They'll be cached in the kennel, even though it may be that the application could make use of them before the earlier blocked packet.

  • They are unrelated. Larger windows help achieve higher throughput over paths with high delay. You allude to selective acknowledgements as a way to repair loss before the window completely drains which is true, but my point is that no data can be delivered to the application until the loss is repaired (and that repair takes at least a round-trip time). (Then the follow-on effects from noticed loss on the congestion controller can limit subsequent in-flight data for a time, etc, etc.)

    • The application will hang waiting for the stack, but the stack keeps working and once the drop is remedied, the application will get a flood of data at a higher rate than the max network rate. So the application may pause sometimes, but the average rate of throughput is not much affected by drops.

  • The queuing discipline used by default (pfifo_fast) is barely more than 3 FIFO queues bundled together. The 3 queues allow for a barest minimum semblance of prioritisation of traffic, where Queue 0 > 1 > 2, and you can tweak some tcp parameters to have your traffic land in certain queues. If there's something in queue 0 it must be processed first before anything in queue 1 gets touched etc.

    Those queues operate purely head-of-queue basis. If what is at the top of the queue 0 is blocked in any way, the whole queue behind it gets stuck, regardless of if it is talking to the same destination, or a completely different one.

    I've seen situations where a glitching network card caused some serious knock on impacts across a whole cluster, because the card would hang or packets would drop, and that would end up blocking the qdisc on a completely healthy host that was in the middle of talking to it, which would have impacts on any other host that happened to be talking to that healthy host. A tiny glitch caused much wider impacts than you'd expect.

    The same kind of effect would happen from a VM that went through live migration. The tiny, brief pause would cause a spike of latency all over the place.

    There are classful alternatives like fq_codel that can be used, that can mitigate some fo this, but you do have to pay a small amount of processing overhead on every packet, because now you have a queuing discipline that actually needs to track some semblance of state.