> Linux is really hurt here by the total lack of any unit testing or UI scripting standards.
> standards
I've been very impressed reading how the Rust developers handle this. They have a tool called crater[1], which runs regression tests for the compiler against all Rust code ever released on crates.io or GitHub. Every front-facing change that is even slightly risky must pass a crater run.
Surely Microsoft has internal tools for Windows that do the same thing: run a battery of tests across popular apps and make sure changes in the OS don't break any user apps.
Where's the similar test harness for Linux you can run that tests hundreds of popular apps across Wayland/X11 and Gnome/KDE/XFCE and makes sure everything still works?
> Surely Microsoft has internal tools for Windows that do the same thing: run a battery of tests across popular apps and make sure changes in the OS don't break any user apps.
And hardware, they actually deploy to hardware they buy locally from retailers to verify things still work too last I checked. Because there is always that "one popular laptop" that has stupid quirks. I know they try to focus on a spectrum of commonly used models based on the telemetry too.
And crater costs a bunch, runs for a week, and it's not a guarantee things won't break. I'm not sure it runs every crate or just top 1 million. It used to, but I could see that changing, if
And in case of closed source software, that isn't publicly available, crates wouldn't work.
Crater's an embarrassingly parallel problem though, it's only a matter of how much hardware you throw at it. Microsoft already donates the hardware used by Crater, it would have no problem allocating 10x as much for its own purposes.
This still has nothing to do with Linux. Unit testing isn't standardized in most languages. Even in Rust people have custom frameworks!
The Linux Kernel does have such a project doing batteries of tests. Userspace may not, but that's not a "unit test" problem. In fact it's the opposite, it's integration tests.
Right, but Linux (the OS) doesn't have unit tests to ensure that changes to the underlying system doesn't break the software on top. Imagine if MS released a new version of Windows and tons of applications stopped functioning. Everyone would blame MS. The Linux community does it all the time and just says that it's the price of progress.
I think the problem is that there isn't really a thing like "Linux the OS"; there's Debian, Ubuntu, Gentoo, Red Hat, and more than I can remember, and they all do things different: sometimes subtly so, sometimes not so subtly. This is quite different from the Windows position where you have one Windows (multiple editions, but still one Windows) and that's it.
This is why a lot of games now just say "tested on Ubuntu XX LTS" and call it a day. I believe Steam just ships with half an Ubuntu system for their Linux games and uses that, even if you're running on Arch Linux or whatnot.
This has long been both a strong and weak point of the Linux ecosystem. On one hand, you can say "I don't want no stinkin' systemd, GNU libc, and Xorg!" and go with runit, musl, and Wayland if you want and most things still work (well, mostly anyway), but on the other hand you run in to all sort of cases where it works and then doesn't, or works on one Linux distro and not the other, etc.
I don't think there's clean solution to any of these issues. Compatibility is the one of the hard problems in computers because there is no solution that will satisfy everyone and there are multiple reasonable positions, all with their own trade-offs.
So, I very much agree with mike_hearn, their description of how glibc is backwards compatible in theory due to symbol versioning matches my understanding of how glibc works, and their lack of care to test if glibc stays backwards compatible in practice seems evident. They certainly don't seem to do automated UI tests against a suite of representative precompiled binaries to ensure compatibility.
However, I don't understand where unit testing comes in. Testing that whole applications keep working with new glibc versions sounds a lot like integration testing. What's the "unit" that's being tested when ensuring that the software on top of glibc doesn't break?
I've used Linux since the Slackware days. I also spent years working on Wine, including professionally at CodeWeavers. My name can still be found all over the source code:
Some of the things I worked on were the times when the kernel made ABI changes that broke Wine, like here, where I work with Linus to resolve a breakage introduced by an ABI incompatible change to the ptrace syscall:
I also did lots of work on cross-distribution binary compatibility for Linux apps, for example by developing the apbuild tool which made it easy to "cross compile" Linux binaries in ways that significantly increased their binary portability by controlling glibc symbol versions and linker flags:
So I think I know more than my fair share about the guts of how Win32 and Linux work, especially around compatibility. Now, if you had finished reading to the end of the sentence you'd see that I said:
"Linux is really hurt here by the total lack of any unit testing or UI scripting standards"
... unit testing or UI scripting standards. Of course Linux apps often have unit tests. But to drive real world apps through a standard set of user interactions, you really need UI level tests and tools that make UI scripting easy. Windows has tons of these like AutoHotKey, but there is (or was, it's been some years since I looked) a lack of this sort of thing for Linux due to the proliferation of toolkits. Some support accessibility APIs but others are custom and don't.
It's not the biggest problem. The cultural issues are more important. My point is that the reason Win32 is so stable is that for the longest time Microsoft took the perspective that it wouldn't blame app developers for changes in the OS, even when theoretically it could. They also built huge libraries of apps they'd purchased and used armies of manual testers (+automated tests) to ensure those apps still seemed to work on new OS versions. The Wine developers took a similar perspective: they wouldn't refuse to run an app that does buggy or unreasonable things, because the goal is to run all Windows software and not try to teach developers lessons or make beautiful code.
> But to drive real world apps through a standard set of user interactions, you really need UI level tests and tools that make UI scripting easy. Windows has tons of these like AutoHotKey, but there is (or was, it's been some years since I looked) a lack of this sort of thing for Linux due to the proliferation of toolkits.
This made me remember a tool that was quite popular in the Red Hat/GNOME community in 2006-2007 or so:
You cut out a key word:
> Linux is really hurt here by the total lack of any unit testing or UI scripting standards.
> standards
I've been very impressed reading how the Rust developers handle this. They have a tool called crater[1], which runs regression tests for the compiler against all Rust code ever released on crates.io or GitHub. Every front-facing change that is even slightly risky must pass a crater run.
https://github.com/rust-lang/crater
Surely Microsoft has internal tools for Windows that do the same thing: run a battery of tests across popular apps and make sure changes in the OS don't break any user apps.
Where's the similar test harness for Linux you can run that tests hundreds of popular apps across Wayland/X11 and Gnome/KDE/XFCE and makes sure everything still works?
> Surely Microsoft has internal tools for Windows that do the same thing: run a battery of tests across popular apps and make sure changes in the OS don't break any user apps.
And hardware, they actually deploy to hardware they buy locally from retailers to verify things still work too last I checked. Because there is always that "one popular laptop" that has stupid quirks. I know they try to focus on a spectrum of commonly used models based on the telemetry too.
And crater costs a bunch, runs for a week, and it's not a guarantee things won't break. I'm not sure it runs every crate or just top 1 million. It used to, but I could see that changing, if
And in case of closed source software, that isn't publicly available, crates wouldn't work.
Crater's an embarrassingly parallel problem though, it's only a matter of how much hardware you throw at it. Microsoft already donates the hardware used by Crater, it would have no problem allocating 10x as much for its own purposes.
6 replies →
This still has nothing to do with Linux. Unit testing isn't standardized in most languages. Even in Rust people have custom frameworks!
The Linux Kernel does have such a project doing batteries of tests. Userspace may not, but that's not a "unit test" problem. In fact it's the opposite, it's integration tests.
Right, but Linux (the OS) doesn't have unit tests to ensure that changes to the underlying system doesn't break the software on top. Imagine if MS released a new version of Windows and tons of applications stopped functioning. Everyone would blame MS. The Linux community does it all the time and just says that it's the price of progress.
I think the problem is that there isn't really a thing like "Linux the OS"; there's Debian, Ubuntu, Gentoo, Red Hat, and more than I can remember, and they all do things different: sometimes subtly so, sometimes not so subtly. This is quite different from the Windows position where you have one Windows (multiple editions, but still one Windows) and that's it.
This is why a lot of games now just say "tested on Ubuntu XX LTS" and call it a day. I believe Steam just ships with half an Ubuntu system for their Linux games and uses that, even if you're running on Arch Linux or whatnot.
This has long been both a strong and weak point of the Linux ecosystem. On one hand, you can say "I don't want no stinkin' systemd, GNU libc, and Xorg!" and go with runit, musl, and Wayland if you want and most things still work (well, mostly anyway), but on the other hand you run in to all sort of cases where it works and then doesn't, or works on one Linux distro and not the other, etc.
I don't think there's clean solution to any of these issues. Compatibility is the one of the hard problems in computers because there is no solution that will satisfy everyone and there are multiple reasonable positions, all with their own trade-offs.
So, I very much agree with mike_hearn, their description of how glibc is backwards compatible in theory due to symbol versioning matches my understanding of how glibc works, and their lack of care to test if glibc stays backwards compatible in practice seems evident. They certainly don't seem to do automated UI tests against a suite of representative precompiled binaries to ensure compatibility.
However, I don't understand where unit testing comes in. Testing that whole applications keep working with new glibc versions sounds a lot like integration testing. What's the "unit" that's being tested when ensuring that the software on top of glibc doesn't break?
You're right, I should have written "integration tests".
The Linux Kernel does have tests, and many of the apps on top have unit tests too.
> Imagine if MS released a new version of Windows and tons of applications stopped functioning. Everyone would blame MS.
I don't have to imagine, this literally happens every Windows release.
Well, let's see. What do I know about this topic?
I've used Linux since the Slackware days. I also spent years working on Wine, including professionally at CodeWeavers. My name can still be found all over the source code:
https://gitlab.winehq.org/search?search=mike%20hearn&nav_sou...
and I'm listed as an author of the Wine developers guide:
https://wiki.winehq.org/Wine_Developer%27s_Guide
Some of the things I worked on were the times when the kernel made ABI changes that broke Wine, like here, where I work with Linus to resolve a breakage introduced by an ABI incompatible change to the ptrace syscall:
https://lore.kernel.org/all/1101161953.13273.7.camel@littleg...
I also did lots of work on cross-distribution binary compatibility for Linux apps, for example by developing the apbuild tool which made it easy to "cross compile" Linux binaries in ways that significantly increased their binary portability by controlling glibc symbol versions and linker flags:
https://github.com/DeaDBeeF-Player/apbuild/blob/master/Chang...
So I think I know more than my fair share about the guts of how Win32 and Linux work, especially around compatibility. Now, if you had finished reading to the end of the sentence you'd see that I said:
"Linux is really hurt here by the total lack of any unit testing or UI scripting standards"
... unit testing or UI scripting standards. Of course Linux apps often have unit tests. But to drive real world apps through a standard set of user interactions, you really need UI level tests and tools that make UI scripting easy. Windows has tons of these like AutoHotKey, but there is (or was, it's been some years since I looked) a lack of this sort of thing for Linux due to the proliferation of toolkits. Some support accessibility APIs but others are custom and don't.
It's not the biggest problem. The cultural issues are more important. My point is that the reason Win32 is so stable is that for the longest time Microsoft took the perspective that it wouldn't blame app developers for changes in the OS, even when theoretically it could. They also built huge libraries of apps they'd purchased and used armies of manual testers (+automated tests) to ensure those apps still seemed to work on new OS versions. The Wine developers took a similar perspective: they wouldn't refuse to run an app that does buggy or unreasonable things, because the goal is to run all Windows software and not try to teach developers lessons or make beautiful code.
> But to drive real world apps through a standard set of user interactions, you really need UI level tests and tools that make UI scripting easy. Windows has tons of these like AutoHotKey, but there is (or was, it's been some years since I looked) a lack of this sort of thing for Linux due to the proliferation of toolkits.
This made me remember a tool that was quite popular in the Red Hat/GNOME community in 2006-2007 or so:
https://gitlab.com/dogtail/dogtail
I wonder if it every got any traction?
Thank you for your work!