← Back to context

Comment by n_ary

1 year ago

Unfortunately, an aircraft has no “reboot”. It is just a violent power cut. A lot of headache is introduced in non-critical aircraft software because there is no “graceful shutdown” or long power duration. Infact, certain hardware has an upper limit(much lower than a week) before which it needs one power cut(sometimes called power cycle) or it suffers from various buffer overflow, counter overflow and starts acting mysterious.

It's amazing that's legal. Like, why do we accept software that does this? It can be done in such a way that these things don't happen.Put another way, why aren't the companies involved being fined and sued out of business? Why aren't their managers facing criminal negligence charges? It's outrageous.

  • Because there has never been a single commercial jetliner fatality caused by software in its intended operational domain failing to operate according to specification. That makes the commercial jetliner software development and deployment process by far the safest and highest reliability ever conceived by multiple orders of magnitude. We are talking in the 10-12 9s range.

    And just to get ahead of: “Well what about the 737 MAX”, that was a system specification error, not due to “buggy” software failing to conform to its specification. The software did what it was supposed to do, but it should not have been designed to do that given the characteristics of the plane and the safety process around its usage.

    • >“Well what about the 737 MAX”, that was a system specification error, not due to “buggy” software failing to conform to its specification. The software did what it was supposed to do

      Exactly: the system was designed to fly the plane into the ground if a single sensor was iced up, and that's exactly what the software did. Boeing really thought this system specification was a good idea.

      5 replies →

    • So what should we make of these issues described in the article? When, not if, this kind of thing kills people will it be a specification error? Will we blame it on maintenance? Surely it can't be the software's fault!

      11 replies →

  • Because it works fine. A maintenance tech gets one extra line item on the weekly or monthly inspection checklist.

    • It works fine until it doesn't and people die. At which point the blame falls on the maintenance crew? That's wrong. And where there's smoke there's fire. If the software has this horrible bug, likely the broken culture that created it has written worse, more subtle bugs.

      4 replies →

  • Because changes to that software go through a enormous amount of testing, validating and documentation for a new baseline to become a flashable item. Meanwhile a always working workaround is needed now.

  • Have you even found the documentation around things like ACPI? It's kinda coupled with UEFI these days I think, and hell, I'm not even sure of the hardware boards/revisions aircraft makers are using these days... Are they still on BIOS? Or old-as-sin linux/RTOS kernels/microcontrollers?

    Point being, when you start talking about high QA systems, where the Quality is non-negotiable (you will have everything documented and tested); barring exec/managerial malfeasance in preventing that work from being done, you reach for the same simple things over and over again since it takes a hell of a lot of work to actually characterize and certify a thing to the requisite level of reliability/operating conditions.

    Testing ain't free, ya know.

> Unfortunately, an aircraft has no “reboot”. It is just a violent power cut.

That’s a reboot.

  • There’s nothing about a reboot that precludes a graceful shutdown.

    • There's also no reason why a "reboot" can't be a "violent power cut", especially if the equipment in question doesn't hold any state. For instance, there's no reason why you'd need to go through a shutdown sequence for a printer.

      2 replies →

This has to be a joke right ?

You're telling me Aerospace's "real engineering-level" is worse than something a sophomore can cook up ?

  • The testing for aerospace is extremely rigorous ... For DO-178C level A (Catastrophic failure that can cause a crash or many fatal injuries) we're estimating 2 years to do MC/DC test coverage metric of a fairly basic software system that has two mechanical backups. And that's above and beyond the extensive unit tests.

    The main thing that gets checked is the worst-case timing analysis for every branch condition. And there are stack monitors to monitor if the stack is growing in size.

    Look at Rapita System's website for more info ... we don't use them, but they explain it well.

>an aircraft has no “reboot”. It is just a violent power cut

Guess how I typically reboot things :)

  • By traveling to Mexico and laying out bait along the migratory path of the butterflies?