Comment by markus92

12 days ago

We (small HPC system) just upgraded our OS from RHEL 7 to RHEL 9. Most user apps are dynamically linked, too.

You don't want to believe how many old binaries broke. Lot of ABI upgrades like libpng, ncurses, heck even stuff like readline and libtiff all changed just enough for linker errors to occur.

Ironically all the statically compiled stuff was fine. Some small things like you mention only linking to glibc and X11 was fine too. Funnily enough grabbing some old .so files from the RHEL 7 install and dumping them into LD_LIBRARY_PATH also worked better than expected.

But yeah, now that I'm writing this out, glibc was never the problem in terms of forwards compatibility. Now running stuff compiled on modern Ubuntu or RHEL 10 on the older OS, now that's a whole different story...

> Funnily enough grabbing some old .so files from the RHEL 7 install and dumping them into LD_LIBRARY_PATH also worked better than expected.

Why "better than expected"? I can run the entire userspace from Debian Etch on a kernel built two days ago... some kernel settings need to be changed (because of the old glibc! but it's not glibc's fault: it's the kernel who broke things), but it works.

> Now running stuff compiled on modern Ubuntu or RHEL 10 on the older OS, now that's a whole different story...

But this is a different problem, and no one makes promises here (not the kernel, not musl). So all the talk of statically linking with musl to get such type of compatibility is bullshit (at some point, you're going to hit a syscall/instruction/whatever that the newer musl does that the older kernel/hardware does not support).

  • Better than expected as it's mixing userlands. We didn't put the entire /usr/lib of the old system in LD_LIBRARY_PATH but just some stuff like old libpng, libjpeg and the shebang. Taking an image of an old compute node still on RHEL 7 and then dumping it a container naturally worked, but at that point it's only the kernel interface you have to worry about, not different glibc, gtk, qt and that kind of stuff.

  • > it's the kernel who broke things

    I remember this in a heated LKML exchange, 13 years ago, look how the table has turned:

    >> Are you saying that pulseaudio is entering on some weird loop if the

    > returned value is not -EINVAL? That seems a bug at pulseaudio.

    Mauro, SHUT THE FUCK UP!

    It's a bug alright - in the kernel. How long have you been a maintainer? And you still haven't learnt the first rule of kernel maintenance?

    If a change results in user programs breaking, it's a bug in the kernel. We never EVER blame the user programs. How hard can this be to understand?

    To make matters worse, commit f0ed2ce840b3 is clearly total and utter CRAP even if it didn't break applications. ENOENT is not a valid error return from an ioctl. Never has been, never will be. ENOENT means "No such file and directory", and is for path operations. ioctl's are done on files that have already been opened, there's no way in hell that ENOENT would ever be valid.

    > So, on a first glance, this doesn't sound like a regression,

    > but, instead, it looks tha pulseaudio/tumbleweed has some serious

    > bugs and/or regressions.

    Shut up, Mauro. And I don't _ever_ want to hear that kind of obvious garbage and idiocy from a kernel maintainer again. Seriously.

    I'd wait for Rafael's patch to go through you, but I have another error report in my mailbox of all KDE media applications being broken by v3.8-rc1, and I bet it's the same kernel bug. And you've shown yourself to not be competent in this issue, so I'll apply it directly and immediately myself.

    WE DO NOT BREAK USERSPACE!

    Seriously. How hard is this rule to understand? We particularly don't break user space with TOTAL CRAP. I'm angry, because your whole email was so _horribly_ wrong, and the patch that broke things was so obviously crap. The whole patch is incredibly broken shit. It adds an insane error code (ENOENT), and then because it's so insane, it adds a few places to fix it up ("ret == -ENOENT ? -EINVAL : ret").

    The fact that you then try to make excuses for breaking user space, and blaming some external program that used to work, is just shameful. It's not how we work.

    Fix your f*cking "compliance tool", because it is obviously broken. And fix your approach to kernel programming.

                   Linus