← Back to context

Comment by ajross

1 year ago

This is all absolutely true, but it's not really an argument for or against ACPI or DTS or OF or any of that stuff. They're all sort of messy, but frankly all aimed at solving the wrong problem.

The root cause to every single problem in this space is simple: the OS and the firmware need to coordinate, period. That's a software problem. It's a complicated software problem. And it has to be solved with software engineering, with stuff like interface specifications and division of responsibilities and test suite and conformance tests and all the other boring stuff.

And ACPI is, sure, a reasonable language for expressing most of that work. But... no one does the work![1] Vendors make windows run and then ship it, and will occasionally fix bugs reported by critical customers. And that's all you get. Linux comes along later (even new versions of Windows often have crazy workarounds as I understand it) and needs to piece together whatever the firmware is doing ad hoc, because none of those vendors are going to bother with anyone that isn't Microsoft.

[1] In the PC world. In the world of integrated hardware things are much better. You can rely on your Mac or your Chromebook or your Galaxy S412 not to have these kinds of bugs, not becuase they use better firmware interface languages (arguably they don't -- I've seen DTS trees for ARM chromebooks!), but because they treat this as a software problem and have the firmware and OS teams work with each other.

The reality is that for highly integrated devices, you just ship a bunch of hacks, sometimes because you forgot to follow the spec and it was faster to patch a line in kernel than patch the firmware (Hello, intel mac broken ACPI tables!). A kernel driver for a phone component might have hardcoded set of quirks selected by string from device tree.

In the world of PCs, the reason Linux emulates Windows in terms of ACPI is because Microsoft not only is a big vendor - all those "designed for windows" labels on computers actually required passing test suites etc. Microsoft also publishes add-on specifications for things that are underspecced in ACPI - for example the ACPI spec does specify how to make an interface for backlight control. But it does not tell you the ranges that the OS and said interface have to support. Microsoft provides such description, that for example if OS responds with _OSI(Windows2003) then the supported ranges will be 0-5 (purely imagined example), but if it also responds _OSI(Windows2007) then the supported values can be 0-15, etc.

This is also why firmware situation on ARM is so shitty - vendors aren't forced to do the work, so they don't. With Windows, the vendor is external and it's pretty rare to avoid implementing things right (one example is Qualcomm fucking up Windows-on-ARM interfaces somewhat impressively and fixing it by injected drivers)

  • > The reality is that for highly integrated devices, you just ship a bunch of hacks,

    That's true, but only in the specious sense that all integrated software is "a bunch of hacks". Fixing glitches due to misunderstandings between an API consumer and an API provider is something we all do, every day. And the essence of the problem is no different whether the technology is a Javascript app framework or a NVMe firmware driver.

    I mean, sure, it's better to make sure that the NVMe firmware driver (picking on that because it was an example in the linked article) and the OS have a clean specification for interaction. And sometimes they do! But it's likewise important that everyone write appropriately idiomatic React code too, and we don't demand[1] a whole new technology idiom to prevent front end developers from doing dumb stuff.

    The solution is about engineering practice and not technology, basically. ACPI isn't going to solve that, for the reason that ACPI didn't create the problem. It's not ACPI's problem to solve.

    [1] Well, people kinda do, actually. Abstraction disease is another problem.

The problem with the PC world is that the firmware and OS teams are not only working for different companies, they're working on different timescales and release cadences. Android devices and Macs are in an entirely different situation here, so the only really comparable example is the Chromebook - and that's a market where if Google doesn't want to support your hardware you don't get to ship it.

  • The point isn't that they're comparable, obviously they aren't. It's that the techniques used to solve the problem (synchronize "timescales and release cadences" in your example) are engineering management things and not technologies.

    It's true that "Linux vs. Lenovo" doesn't have the same engineering practice tools available. But... evangelizing technological tools like "ACPI" isn't a path to a solution, because... those tools just don't work.

    Or rather they don't work in the absence of someone addressing the engineering practice problem. They work fine to do what they're designed to do. But what they're designed to do emphatically isn't "Make Lenovo Hardware Work With Linux".

    • My point is that focusing on tight integration between OS teams and the underlying platform is a great way to develop a device that only runs one operating system and a terrible way to develop a device that's supposed to be part of an open ecosystem. ACPI is the least bad way we currently have to solve the latter problem. It doesn't guarantee that a Lenovo will work with Linux, but in the absence of an explicit development program it gives it a fighting chance.

      2 replies →

> And ACPI is, sure, a reasonable language for expressing most of that work. But... no one does the work![1]

Maybe. Or maybe the "work doesn't get done" in part because that interface language is simultaneously overengineered and underspecified, and people who start out with the best of intentions end up drowning under a pile of incomprehensible ACPI docs and copy and pasting whatever junk seems to make Windows handle it ok because that's the only way out of the nightmare.