Comment by black-tea

7 years ago

You're probably getting downvoted by people who have used Linux in the past 10 years.

It was / still is that the default scheduler is / was bad under heavy IO, copying a file on a USB disk would freeze the whole system ( on desktop ).

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/131094

  • I encountered this. I remember it. It was/is real. But I only happened to encounter it on Ubuntu, fwiw.

    • And it's a particularly interesting issue, because this problem mirrors the congestion control failure observed on most networks in recent years. We all have seen this problem, on a busy network, the latency will increase by two order of magnitudes, ruining other network activities like web browsing, even themselves require only a little bit of bandwidth. The simplest demo is uploading a large file, while observing the ping latency, it would just jump from 100ms to 2000ms. But it should not happen, because the TCP congestion control was just designed to solve it.

      In turns out that the cause of this problem, known as bufferbloat, is the accumulated effect of excessive buffering in the network stack, mostly the system packet queue, but also includes the routers, switches, drivers and hardware, since RAM is cheap nowadays. The TCP congestion control works like this: if packet loss is detected, then sends at a lower rate. But when there are large buffers on the path for "improving performance", the packets are never lost when if the path is congested, instead, they would be put into a huge buffer, so TCP will never slow down properly as designed, and during the slow-start, it believes it's on the way going to the moon. On the other hand, all the buffers are FIFO, it means when your new packets have a chance to get out, it's probably no longer relevant, since it takes seconds for moving it from the tail of the queue to the head, the connection would be timed out already.

      Solutions include killing buffers and limiting their length (byte queue limit, TCP small queue), another innovation is new queue management algorithms: we don't have to use a mindless FIFO queue, we can make them smarter. As a result, CoDel and fq_codel are invented to implement "DELay-COntrolled queues", they are designed to prioritize new packets that are just arrived, and dropping old packets to keep your traffic flowing.

      And people realized the Linux I/O freeze is a variant of bufferbloat, and the very same ideas of the CoDel algorithm can be applied to the Linux I/O freeze problem.

      Another interesting aspect was, that the problem is NOT OBSERVABLE if the network is fast enough, or the traffic is low, because the buffering does not occur, so it will never be caught in many benchmarks. On the other hand, when you start uploading a large file over a slow network, or start copying a large file to a USB thumb drive on Linux...

      https://lwn.net/Articles/682582/

      and

      https://lwn.net/Articles/685894/

      4 replies →

I'm on Linux right now and can cause this to happen by using more than 60% of my RAM. Afaik no distros correctly handle swapping onto a HDD.

If I hit 80%, I get 10-20 second lock ups.

If I hit >95%, I get 1-2 minute lock ups.

Using Ubuntu 18.04.

  • What is the "correct" handling of swap on an HDD supposed to be like? It is going to be slow no matter what you do. Windows also locks up for long periods of time if you use up almost all the RAM and it has to swap to HDD.

    • When I say lock up I mean the UI completely stops. As in, my i3 bar stops updating for several minutes. Not even the linux magic commands let me recover.

      On windows things may be unresponsive, but at least ctrl-alt-del responds, and at least the mouse moves!

      The main difficulty is I can't tell if my machine has crashed vs is overloaded if the UI doesn't do anything for several minutes.

      4 replies →

    • "Correct" handling of swap would mean mostly leaving the window manager and its dependencies in memory. Individual application windows may stop responding, but everything else should be pretty quick. And small things like terminal emulators should get priority to stay in ram too.

    • Not that I'd recommend this, but my work MBP has 16GB of ram, and my typical software development setup (JVM, IntelliJ, Xcode, gradle) easily uses up 30GB. It swaps a lot but generally OSX does a good job of keeping the window manager and foreground applications at priority so I can still use my machine while this is happening.

      I attribute this to the fact that the darwin kernel has a keen awareness of what threads directly affect the user interface and which do not (even including the handling of XPC calls across process boundaries... if your work drives the UI, you get scheduling/RAM priority). I don't think the linux kernel has nearly this level of awareness.

      1 reply →

  • Install earlyoom, it will kill processes before a lock up occurs. Greatly helped me (at the expense of random chrome tabs killed)

I've experienced a total freeze before once one of my programs started swapping/thrashing. The entire desktop froze, not just the one program. This was in the past two years or so, so it's not a solved problem.