Comment by everlier
3 days ago
I'm glad to hear I'm not alone. Due to the nature of what I do, I'm often accumulating ~800-900GB of Docker images and volumes on my machine, sometimes running 20-30 containers at once starting/stopping them concurrently. Somehow, very rarely, but still quite often (once every couple of weeks) - it leads to a complete deadlock somewhere inside of the kernel due to some crazy race condition that I'm absolutely in no way able to reliably reproduce.
It's much tougher when it's so hard to reproduce. Perhaps the NMI watchdog could help? https://docs.kernel.org/admin-guide/lockup-watchdogs.html