Comment by epistasis

2 months ago

It's not just Ubuntu, Arch is just as bad. The primary problem is systemd, which provided an adequate OOMd for daemons, but then all the distributions seem to be using it for interactively launched processes

If anybody can help me out with a better solution with a modern distribution, that's about 75% of the reason I'm posting. But it's been a major pain and all the GitHub issues I have encountered on it show a big resistance to having better behavior like is the default for MacOS, Windows, or older Linux.

4 comments

epistasis

tmtvl 2 months ago

It's funny how you say the way it used to be was better when people always complained about the OOM killer waiting until the system had entirely ground to a halt before acting, to the point some preferred to run with 0 swap so the system would just immediately go down instead.

Regardless, I believe EarlyOOM is pretty configurable, if you care to check it out.

epistasis 2 months ago

Thanks for the EarlyOOM pointer, it's one that I found (from HN) on my investigation of why an entire process group was getting killed rather than single processes.
The problem is not that OOM killing happens earlier under memory pressure, but rather the problem is what gets killed. Previously an offending process would get killed. Now it's an entire cgroup. So if you are using process isolation to run a batch of computation jobs, each of which takes different amounts of memory and it is not foreseeable which will take too much memory until runtime, the OOM killer takes out the batch manager and its shell and everything. So the process can't know ahead of time if it's taking too much memory, because allocations never fail, and the process itself shouldn't be monitoring what is going on the rest of the system to make run time decisions to quit. The entire batch of jobs is killed, rather than a single process dying (as happens for any number of errors) and continuing in with the rest of the batch of jobs. In fact, without interacting directly with systemd-run to create a new cgroup, it's impossible to monitor WTF happened to your process because of this new "nuke it from orbit" behavior.
During my searches on this another common error case is in an IDE where one process goes wild and takes too much memory, and then the whole IDE gets killed silently instead of single process killing allowing the app to save state.
This is a very fundamental change to how Linux has worked, it's a novel concept unfamiliar to long time users (who the fuck actually knows about cgroups or uses them extensively except for people heavy int containerization?), and workarounds for the behavior require introducing heavy dependency on systems in order to get basic functionality, making my code far less portable. I can understand being dependent on GNU, and some linuxisms in syscalls, but changing the basic semantics of launching new processes such that new code dependencies are needed for intricate cgroup control, well, that's a bit much for me. Leave systems-oomd to manage cgroups and containers, but having it manage desktop apps and standard Unix process launching leads to bad code.

saghm 2 months ago

Interesting, either I haven't run into it much or I haven't recognized the source of it when it's something I have encountered.

epistasis 2 months ago

Because the entire cgroup gets killed rather than individual processes, there's zero trace left. When k first encountered it I was running a multi day compute pipeline in tmux, and I saw my compute pane gone, and thought that I just have accidentally nuked the entire pane, killing the job. A few more attempts and I finally realized it wasn't me, and I checked journactl to find out that it was OOM killed but I couldn't for the life of me figure out why the shell got killed to, what's the point of killing a process with tiny memory? Turns out that is the desired behavior of systemd, and thus of many distributions now.