Why ACPI?

2 years ago (mjg59.dreamwidth.org)

97 comments

ingve

I find it kind of amusing that the dynamic configuration problem of hardware is so tough and think about the old mainframe and minicomputer OS of the 1970s which avoided all that by starting out with some configuration that supported one terminal and limited storage devices and would recompile the OS for the exact hardware configuration of the machine and print it to a paper tape or magnetic tape and they'd boot off that. Thus you had a "systems programmer" at every mainframe installation.

That part of the industry got into dynamic configuration to support hot plugging and generally being able to change the hardware configuration without causing downtime.

derefr 2 years ago

> which avoided all that by starting out with some configuration that supported one terminal and limited storage devices and would recompile the OS for the exact hardware configuration of the machine and print it to a paper tape or magnetic tape and they'd boot off that.
Not even. The OEM-shipped machine-specific bootstrap tape (i.e. the one that "supported one terminal and limited storage devices") was still used for initial system bringup, even after you recompiled your userland software distribution of choice for your computer's weird ISA and wrote it out to a tape. The OEM-shipped machine-specific bootstrap tape got mounted on /; brought up the system just enough to mount other devices; and then the userland software distribution got mounted to /usr.
(Back then, you wouldn't want to keep running things from the machine-specific bootstrap tape — the binaries were mostly very cut-down versions, of the kind that you could punch in from panel toggle switches in a pinch. You couldn't unspool the tape, because the kernel and some early daemons were still executing from the tape; but you wouldn't want anything new to execute from there. Thus $PATH. In /usr/bin you'd find a better shell; a better ls(1); and even what you'd think of today as rather "low-level" subsystems — things like init(8). $PATH was /usr/bin:/bin because "once you have /usr/bin, you don't want to be invoking any of those hyperminimal /bin versions of anything any more; you want the nice convenient /usr/bin ones.")
teunispeters 2 years ago
Ah back when the whole supply chain had a single manufacturer and no one worried about whether someone might want to put in - say - two video cards or the like.
Apple still kind of exists in this space.
- watersb 2 years ago
  
  Ironically, Apple implemented dynamic hardware configuration long before it was a standard feature in PC platforms.
  I was tempted to jump on the "two video cards" example, but the original IBM PC could support both a CGA (for color) and MDA (monochrome, sharper text) in the same host. I never did that myself, but every card I did use required you to flip switches or jumpers on each ISA board to configure its interrupts and memory address of its I/O ports.
  Apple adopted NuBus for its Macintosh expansion platform. Boards were plug and play, automatically configured. Of course, the hardware required on the NuBus card to support this functionality was the better part of a whole separate Mac in its own right; the hardware dev kit cost $40,000.
  Two video cards in a Mac just worked.
  (Of course, I took your comment to refer to hardware less than 20 years old. But even now, there's dynamic hardware. Apple loved Thunderbolt because they wanted external expansion cards over a wire.)
  
  1 reply →
- PaulHoule 2 years ago
  
  Wasn't like that all with DEC and I don't think so with IBM mainframes either.
  It was common for DEC systems to have custom Unibus cards
  https://en.wikipedia.org/wiki/Unibus
  as these were really easy to make. They dealt with them by building custom drivers right into the OS when they build an OS image.
  Circa 2002 a friend of mine developed custom printer interfaces and drivers for IBM z because IBM's printer interface couldn't support the high rate of printing that New York state needed to satisfy the mandate that any paperwork could be turned around in 2 days or less.
  Whatever you say about NY it is impressive how fast you get back your tax returns, driver license, permit to stock triploid grass carp or anything routine like that.
  
  1 reply →
Sharlin 2 years ago

The PC is really incredibly unique as a computing platform in how open to third-party extension and customization it ended up becoming (even though it was definitely not IBM's intention!) This has mostly been very good for the consumer, but the combinatorial explosion of different hardware combinations was for a long time a compatibility nightmare, and to some extent still is.

tux3 2 years ago

I would like to offer a prophecy: For the next evolution of ACPI, Linux kernel devs (employed at hardware companies) will figure out a way to replace the bespoke bytecode with eBPF.

Windows will, of course, spawn a WSL instance when it needs to interact with ACPI. macOS is its own hardware platform, and will naturally come up with their own separate replacement.

kryptiskt 2 years ago
There already is an eBPF for Windows, it's even Microsoft's own project https://github.com/microsoft/ebpf-for-windows
- 6th 2 years ago
  
  Ahh. Just as the prophecy foretold.
blueflow 2 years ago
Unlikely. ACPI is made by Wintel vendors, so Windows will get support for the fancy new things and Linux will lag behind until the new thing is documented or reverse engineered.
- surajrmal 2 years ago
  
  ACPI is standardized via a specification. It's quite easy for non windows operating systems to support ACPI. I can't say the same for device tree as that requires reading Linux source.
  
  14 replies →
mschuster91 2 years ago
> macOS is its own hardware platform, and will naturally come up with their own separate replacement.
Actually, no. The M-series SoCs use device trees [1], and in fact their Apple SoC predecessors did just as well - the earliest I could find is the iPhone 3GS [2].
[1] https://lore.kernel.org/lkml/20230202-asahi-t8112-dt-v1-0-cb...
[2] https://www.theiphonewiki.com/wiki/DeviceTree
- monocasa 2 years ago
  
  They're very device tree oriented. They've been using them since "new world PowerPC" Macs in the 90s. Even on x86, their boot loader constructs a device tree to describe the hardware to the kernel.
  
  3 replies →
monocasa 2 years ago
BPF doesn't really make sense here. It can't fully specify the kinds of computation an AML replacement would need since BPF is guaranteed to terminate (it's not Turing complete).
- geertj 2 years ago
  
  For this use case (hardware configuration), that might actually be desirable?
  
  1 reply →
_trackno5 2 years ago

macOS already uses (at least on ARM chips) device trees. I don’t see why they would go back to ACPI as long as they keep their SoC model.
snvzz 2 years ago
Why bother with bespoke bytecode when we have a high quality, standard ISA?
RISC-V's base RV64I has 47 instructions. Legacy ISAs can simply emulate these 47 instructions.
- adrian_b 2 years ago
  
  Bytecode is presumably chosen to minimize the program length, while the RISC-V is at the opposite end of verbosity for representing a program.
  You may be one of those who believe that RISC-V is a high-quality ISA, but this is not an universal opinion and it is a belief that is typically shared only by those who have been exposed to few or no other ISAs.
  In the context of something like ACPI, I would be worried by the use of something like the RISC-V ISA, because this is an ISA very inappropriate for writing safe programs. Even something as trivial as checking for integer overflow is extremely complicated in comparison with any truly high-quality ISA.
  
  10 replies →
- saagarjha 2 years ago
  
  What makes you think RISC-V is a good fit for device configuration?
  
  1 reply →

eismcc 2 years ago

I worked on windows kernel team and my first real projects were ACPI 1 and 2 implementation. It's been a while and ACPI was well on its way when i got there, the story at the time was that huge gaps in BIOS were problem and we need to move it into the kernel. There was also a big push from the industry at the same time to use EFI to allow devices to have a pre-os experience (e.g. play DVDs) and not be dependent upon Windows for those.

Another memory I have for that time was that powermgmt was a big priority. So i suspect the ability for the OS to do that via ACPI was strategic - I wasn't involved in the decision making.

ajross 2 years ago

This is all absolutely true, but it's not really an argument for or against ACPI or DTS or OF or any of that stuff. They're all sort of messy, but frankly all aimed at solving the wrong problem.

The root cause to every single problem in this space is simple: the OS and the firmware need to coordinate, period. That's a software problem. It's a complicated software problem. And it has to be solved with software engineering, with stuff like interface specifications and division of responsibilities and test suite and conformance tests and all the other boring stuff.

And ACPI is, sure, a reasonable language for expressing most of that work. But... no one does the work![1] Vendors make windows run and then ship it, and will occasionally fix bugs reported by critical customers. And that's all you get. Linux comes along later (even new versions of Windows often have crazy workarounds as I understand it) and needs to piece together whatever the firmware is doing ad hoc, because none of those vendors are going to bother with anyone that isn't Microsoft.

[1] In the PC world. In the world of integrated hardware things are much better. You can rely on your Mac or your Chromebook or your Galaxy S412 not to have these kinds of bugs, not becuase they use better firmware interface languages (arguably they don't -- I've seen DTS trees for ARM chromebooks!), but because they treat this as a software problem and have the firmware and OS teams work with each other.

p_l 2 years ago
The reality is that for highly integrated devices, you just ship a bunch of hacks, sometimes because you forgot to follow the spec and it was faster to patch a line in kernel than patch the firmware (Hello, intel mac broken ACPI tables!). A kernel driver for a phone component might have hardcoded set of quirks selected by string from device tree.
In the world of PCs, the reason Linux emulates Windows in terms of ACPI is because Microsoft not only is a big vendor - all those "designed for windows" labels on computers actually required passing test suites etc. Microsoft also publishes add-on specifications for things that are underspecced in ACPI - for example the ACPI spec does specify how to make an interface for backlight control. But it does not tell you the ranges that the OS and said interface have to support. Microsoft provides such description, that for example if OS responds with _OSI(Windows2003) then the supported ranges will be 0-5 (purely imagined example), but if it also responds _OSI(Windows2007) then the supported values can be 0-15, etc.
This is also why firmware situation on ARM is so shitty - vendors aren't forced to do the work, so they don't. With Windows, the vendor is external and it's pretty rare to avoid implementing things right (one example is Qualcomm fucking up Windows-on-ARM interfaces somewhat impressively and fixing it by injected drivers)
- ajross 2 years ago
  
  > The reality is that for highly integrated devices, you just ship a bunch of hacks,
  That's true, but only in the specious sense that all integrated software is "a bunch of hacks". Fixing glitches due to misunderstandings between an API consumer and an API provider is something we all do, every day. And the essence of the problem is no different whether the technology is a Javascript app framework or a NVMe firmware driver.
  I mean, sure, it's better to make sure that the NVMe firmware driver (picking on that because it was an example in the linked article) and the OS have a clean specification for interaction. And sometimes they do! But it's likewise important that everyone write appropriately idiomatic React code too, and we don't demand[1] a whole new technology idiom to prevent front end developers from doing dumb stuff.
  The solution is about engineering practice and not technology, basically. ACPI isn't going to solve that, for the reason that ACPI didn't create the problem. It's not ACPI's problem to solve.
  [1] Well, people kinda do, actually. Abstraction disease is another problem.
mjg59 2 years ago
The problem with the PC world is that the firmware and OS teams are not only working for different companies, they're working on different timescales and release cadences. Android devices and Macs are in an entirely different situation here, so the only really comparable example is the Chromebook - and that's a market where if Google doesn't want to support your hardware you don't get to ship it.
- ajross 2 years ago
  
  The point isn't that they're comparable, obviously they aren't. It's that the techniques used to solve the problem (synchronize "timescales and release cadences" in your example) are engineering management things and not technologies.
  It's true that "Linux vs. Lenovo" doesn't have the same engineering practice tools available. But... evangelizing technological tools like "ACPI" isn't a path to a solution, because... those tools just don't work.
  Or rather they don't work in the absence of someone addressing the engineering practice problem. They work fine to do what they're designed to do. But what they're designed to do emphatically isn't "Make Lenovo Hardware Work With Linux".
  
  3 replies →
lmm 2 years ago

> And ACPI is, sure, a reasonable language for expressing most of that work. But... no one does the work![1]
Maybe. Or maybe the "work doesn't get done" in part because that interface language is simultaneously overengineered and underspecified, and people who start out with the best of intentions end up drowning under a pile of incomprehensible ACPI docs and copy and pasting whatever junk seems to make Windows handle it ok because that's the only way out of the nightmare.

jdblair 2 years ago

I remember when I built a new PC, with some socket 370 pentium or another, around 2002.

I ran an i2c probe to find the addresses to read the fan tachometer. The scan wrote some bits that irrecoverably messed up the firmware, the board wouldn't boot and I replaced it.

rasz 2 years ago

Asus must have anticipated trouble because their boards from that time period hide I2C bus away by default :) You have to do special port knocking incantation to expose it https://www.vogons.org/viewtopic.php?p=1173247#p1173247
>They share i2c bus between clock generator/monitoring chip and ram SPD using a mux. If you switch this mux and forget to switch it back computer will have trouble rebooting because bios blindly assumes its always set to clock gen.

somat 2 years ago

There are three ACPI stacks. The reference intel one, this is what linux, macos and freebsd uses. The microsoft one. and finally those madlads over at the openbsd project have their own. good for them.

ChickeNES 2 years ago

Four: https://github.com/managarm/lai

lmm 2 years ago

And yet somehow APM-based systems broke a lot less often. If the only codepath that's ever going to be tested is "what windows does", maybe having a cruder API that doesn't expose so many details of what's happening (and instead just lets the BIOS opaquely do its thing) is the way to go?

mzs 2 years ago

Pretty much only laptops had APM at first. Hardware didn't change much on laptops. I still had to unplug the ethernet cable from my PCMCIA card every now and then to get it to sense the link, but it wasn't that bad once I got all the right linux patches.
Then power management moved into desktops and servers that had expansion slots and everything became horrible.
p_l 2 years ago
The APM was also tested in practice only with DOS and/or Windows (and not all Windows, which could also be an issue).
And yes, it really didn't work well with dynamically attachable devices that might have important state.
- lmm 2 years ago
  
  > The APM was also tested in practice only with DOS and/or Windows
  Right. I'm speculating that since it had only a small number of entry points, Linux tended to end up following the tested codepath. Certainly it broke a lot less on Linux than ACPI does.
  
  2 replies →

Quentincestino 2 years ago

I still cannot understand your problem with Device Trees after reading your article, I used to write a ARMv8 and a x86 kernel and found out that ACPI and Device Trees had same capacities, but less headhaches with DT.

rjsw 2 years ago

I run NetBSD on several ARMv8 boards. One is ACPI only, all the rest use DeviceTree. Basically impossible to add any extra functionality to the ACPI only one, no problem doing this on the others.
Joker_vD 2 years ago
Where do the device trees come from?
- trelane 2 years ago
  
  You may find https://en.wikipedia.org/wiki/Devicetree
  It discusses this, alongside an interesting history of it, and the current state.
  
  1 reply →
- KirillPanov 2 years ago
  
  The same place ACPI tables come from. A flash chip on the motherboard.
  
  9 replies →

crabbone 2 years ago

The only reason I know ACPI exists is because every Linux laptop I ever had always spit a roll of error messages related to ACPI (and usually no support).

My understanding is that on top of the inherent problems outlined in the article, there's a more trivial problem of vendors not caring enough to do this right. So, typical for Linux laptops, hybernation and many other forms of power-saving either don't work at all, or are broken (eg. a laptop never wakes up from hybernation, or just the screen never wakes up etc.)

mjg59 2 years ago

These days most of these failures are bugs in the Linux drivers, not bugs in the firmware. The Lenovo case I mention in the article is actually unusual in that respect.

dcomp 2 years ago

I'm only slightly familiar with the specific features ACPI provides. But isn't the solution the following

For every "feature" provided by the SMM or bios.

Export a UUID ( eg NVME resume implementation1) Have that feature have an enable and disable function. Have each feature have a dependency on each iorange / firmware device it needs access to.

If the kernel know how to implement the feature it can just disable the feature and then as long as it follows the dependency tree and can see nothing else accesses those ranges. It can know that it has exclusive use. If it doesn't have exclusive use it must use the firmware to access those ranges if possible or fall back to no support

If the firmware has a feature without a disable function. The kernel knows it can never access that hardware directly/safely.

You could even have a "lock device" that if you take you know that SMM won't access those io ranges whilst you have the lock.

Obviously this all requires vendor support

mjg59 2 years ago

This is actually how things are meant to work! Many ACPI features are gated behind a _DSM call that allows the OS to indicate that it has native support for functionality and the firmware should stop touching it itself. It, uh, basically works to the extent that Windows makes use of it.

manonthewall 2 years ago

As someone who had to test various ACPI configs and work my way through the docs, I will never do that again. I will literally get out of my seat and tender my resignation on the spot if they try to force me. Never again. It's probably the most overengineered thing I've ever had to work with.

ez_mmk 2 years ago

Does anyone have a link to the 12k page long discussion?

aseipp 2 years ago
It's a reference to a famous Twitter post: https://twitter.com/dril/status/107911000199671808
- jancsika 2 years ago
  
  I really wish web denizens would use memes with their own local color which, when applicable, are also true.
  I mean, I'm sure Matthew knows some really long, legendary Linux Kernel Mailing List thread that is at least somewhat related to this post. It would be fun because then we could click the hyperlink, and Linux Kernel Mailing List threads are nothing if not dramatic. :)
  Instead, here every web tribe localizes a meme that traces back to a dead end Tweet. So instead of getting to travel back to the actual tribe's colorful flame history, I get the tribe's low-effort localization of a dead Twitter stump. Boo!
kryptiskt 2 years ago

It's a meme, there isn't such a thread (probably). I mean, there is no lack of flame wars about ACPI out there, but that megathread mention is a meme used about controversial subjects.

KirillPanov 2 years ago

[flagged]

tenebrisalietum 2 years ago

> We called this Advanced Power Management (Advanced because before this power management involved custom drivers for every machine and everyone agreed that this was a bad idea)

Not sure this statement is really true.

An OS has to have drivers for diverse hardware already - an OS will be expected to support devices as varied as keyboards, mice, floppy drives, hard drives, VGA, PCI bus, etc.

I guess it sucks to have to develop 10 drivers for 10 different power management controllers, but:

- the industry could have done what they did for storage - make the controller standard on the hardware level.

and

- if companies could have come together to create ACPI, they could have come together to define standard power management hardware interfaces instead.

> and it involved the firmware having to save and restore the state of every piece of hardware in the system.

APM was a crappy idea too, except if you had to support DOS and things built on it like Windows 3.x and 95.

Ideally the power management controller would just shut the system off, provide something that an OS loader can read to know if the system was powered on cold or resumed, and let the OS be responsible for saving and loading state.

> ACPI decided to include an interpreted language to allow vendors to expose functionality to the OS without the OS needing to know about the underlying hardware.

> How is this better than just calling into the firmware to do it? Because the fact that ACPI declares that it's going to access these registers means the OS can figure out that it shouldn't, because it might otherwise collide with what the firmware is doing. With APM we had no visibility into that

So ACPI provides code your OS must execute with top privileges so it doesn't have to know about the hardware, but it still has to know about the hardware so it doesn't accidentally step on it. Definitely better than the manufacturer of any power management hardware just publishing a datasheet and letting the OS handle it like any other device. /s

> There's an alternative universe where we decided to teach the kernel about every piece of hardware it should run on. Fortunately (or, well, unfortunately) we've seen that in the ARM world. Most device-specific simply never reaches mainline, and most users are stuck running ancient kernels as a result

If datasheets were available for the hardware, then open source drivers could be created instead of only relying on closed binary blobs; then those could be mainlined and included with the kernel, and this problem would not exist. The problem is really vendors not releasing information on programming their hardware, not Linux. This goes back to the whole argument: if you pay for and own your hardware, why is the manufactuer able to hide these details from you by not releasing this information?

lproven 2 years ago

Your argument fails at its first line:
> An OS has to have drivers for diverse hardware already - an OS will be expected to support devices as varied as keyboards, mice, floppy drives, hard drives, VGA, PCI bus, etc.
No, it doesn't, because the OS we are talking about here is DOS.
APM was released in 1992: https://en.wikipedia.org/wiki/Advanced_Power_Management
This was before even Windows 3.1 shipped.
MS-DOS 5.0 was new and not that widely used but was catching on: https://en.wikipedia.org/wiki/Timeline_of_DOS_operating_syst...
DOS didn't support half the hardware you cite. It had no direct support for mice, CD-ROMs, PCI, VGA, or any of that. PCI 1.0 was released the same year.
In those days, most PC software used the BIOS to access standard hardware, and anything much past that was up to the vendor to ship a driver.
All APM really had to do was throttle the CPU and maybe, as a vendor extension, put the hard disk to sleep. That's about it.
Things like putting the display to sleep came along with the US Energy Star standard, released -- you guessed it -- in 1992.