Comment by Veserv
1 year ago
Because there has never been a single commercial jetliner fatality caused by software in its intended operational domain failing to operate according to specification. That makes the commercial jetliner software development and deployment process by far the safest and highest reliability ever conceived by multiple orders of magnitude. We are talking in the 10-12 9s range.
And just to get ahead of: “Well what about the 737 MAX”, that was a system specification error, not due to “buggy” software failing to conform to its specification. The software did what it was supposed to do, but it should not have been designed to do that given the characteristics of the plane and the safety process around its usage.
>“Well what about the 737 MAX”, that was a system specification error, not due to “buggy” software failing to conform to its specification. The software did what it was supposed to do
Exactly: the system was designed to fly the plane into the ground if a single sensor was iced up, and that's exactly what the software did. Boeing really thought this system specification was a good idea.
That is a massive over-simplification and that invites patently false characterizations like it was a "stupid mistake" that would have been fixed if they were not stupid (i.e. adopted average development process). That is absolutely not the case. They were really capable, but aerospace problems are really, really hard, and their safety capability regressed from being really, really capable.
They modified the flight characteristics of the system. They tuned the control scheme to provide the "same" outputs as the old system. However, the tuning relied on a sensor that was not previously safety-critical. As the sensor was not previously safety-critical, it was not subject to safety-critical requirements like having at least two redundant copies as would normally be required. They failed to identify that the sensor became safety critical and should thus be subject to such requirements. They sold configurations with redundant copies, which were purchased by most high-end airlines, but they failed to make it mandatory due to their oversight and purchasers decided to cheap out on sensors since they were characterized as non-safety-critical even if they were useful and valuable. The manual, which pilots actually read, has instructions on how to disable the automatic tuning and enable redundant control systems and such procedures were correctly deployed at least once if not multiple times to avert crashes in premier airlines. Only a combination of all of those failures simultaneously caused fatalities to occur at a rate nearly comparable to driving the same distance, how horrifying!
A error in UX tuning dependent on a sensor that was not made properly redundant was the "cause". That is not a "stupid mistake". That is a really hard mistake and downplaying it like it was a stupid mistake underestimates the challenges involved designing these systems. That does not excuse their mistake as they used to do better, much better, like 1,000x better, and we know how to do better and the better way is empirically economical. But, it does the entire debacle a disservice to claim it was just "being stupid". It was not, it was only qualifying for the Olympics when they needed to get the gold medal.
I really don't think it takes a mastermind of software design to go "okay I've built a system that takes control of the plane's maneuverability, let's make sure we have redundant sensors on this". Furthermore, descriptions of MCAS and its role were dangerously under played so that they didn't have to tell their customers to retrain their pilots. An egregious breach of public trust in a company we put a whole lot of faith into.
>They failed to identify that the sensor became safety critical and should thus be subject to such requirements.
Whistleblower testimony indicated it wasn't a failure to identify it as safety critical, but a conscious decision not to mention it as such to the regulator, and not implement it as a dual sensor system as doing so would have caused the design to require Class D simulator training; which Boeing was relying on the abscence of as a selling point to prevent existing airlines from defecting to Airbus.
>They sold configurations with redundant copies, which were purchased by most high-end airlines, but they failed to make it mandatory due to their oversight and purchasers decided to cheap out on sensors since they were characterized as non-safety-critical even if they were useful and valuable.
Incorrect. All MAX's have two AoA vanes, each paired to a single Flight Computer. The plane has two Flight Computers, one on each side of the cockpit, and the computer in command is typically alternated between each flight. One computer per flight will be considered in-command (henceforth referred to as Main), the other will be henceforth referred to as operating as "auxillary". The configuration you're thinking of is an AoA disagree light, implemented by enabling a codepath in software running on the Main FC whereby a cross-check of the value from the AoA vane networked to the auxillary FC would light up a warning light to inform pilots that system automation would be impacted, because the AoA values between the MFC and AFC differed. A pilot would be expected to recognize this as and adapt behavior accordingly/take measures to troubleshoot their instruments. Importantly, however, this feature had zero influence on MCAS. MCAS only took into account inputs from the vane directly wired to the Main FC. While a cross-check happened elsewhere for the sole purpose of illuminating a diagnostic lamp, there was no cross-check functionality implemented within the scope of the MCAS subsystem. The MCAS system was not thoroughly documented in any delivered to the pilot documentation. The program test pilot got specific dispensation to leave that out of the flight manual. See the Congressional investigation, final NTSB, and FAA report.
>The manual, which pilots actually read, has instructions on how to disable the automatic tuning and enable redundant control systems and such procedures were correctly deployed at least once if not multiple times to avert crashes in premier airlines.
The documentation, which included an Airworthiness Directive and NOTAM, informed pilots any malfunction should be treated in the same manner as a stabilizer trim runaway. Said problem is characterized in aviation parlance as a continual uncommanded actuation of trim motors. MCAS, notably is not that. It is periodic, and in point of fact, it ramps up in intensity over time until over 2° of travel are commanded by the computer per actuation event, with the timer between actuations being reset to 5 seconds by use of the on yoke Stab trim switches. This was ncommunicated to pilots. Furthermore, there were design changes to the Stab-Trim Cutout switches between 737NG (MAX's predecessor), and MAX. In the NG, the Stab Trim cutout could isolate the FC alone, or both FC and yoke switches from the Stab Trim motor. In MAX, however, the switches were changed to never isolate the FC from the Stab trim motors, because MCAS being operational was required for being able to checkmark FAR compliance for occupant carrying aircraft. So when that cutout was used, all electrically assisted actuation of the horizontal stabilizer became unavailable. The manual trim wheel would be the only trim input, and in out-of-trim attitudes, would result in such excessive loading on the control surface that physical actuation without electronic assistance was not feasible on the timescales required to recover the plane. There was a maneuver known to assist with these conditions (when they occurred at high altitude) called "roller coastering" in which you dive further into the undesired direction to unload the control surface to render it actuable. This technique has not been in official documentation since Dino 737 (Pre-NG). The events you're referring to when uncommanded actuations were recovered on other flights, happened at high altitudes, and were recovered with countered electrical stab switch actuation followed by Stab trim cutout within the reset 5 second watchdog timer prior to MCAS activation subsequent to a Stab-trim yoke control switch actuation. This procedure, and the implementation details needed to fully understand its significance, were undocumented prior to the two crashes. Furthermore, this procedure to cut out MCAS/the MFC from the stab trim motor and finishing the flight in a completely manually trim controlled configuration meant that technically you were flying an aircraft in a configuration that could not be certified to carry passengers when taking the FAR's prescriptively, and uncompromisingly rules-as-written with zero slack offered for convenience, because MCAS was necessary for grandfathering the MAX under the old type cert, and without MCAS functional, it's technically a new beast, which is non-compliant with control stick force feedback curves when approaching stalls, which by the way, just to make it clear, a compliant curve has been a characteristic of every civil transport in all jurisdictions worldwide for well over 50 years. This was not documented and only became apparent after investigation. Again, see the House findings, FAA report, and NTSB.
>Only a combination of all of those failures simultaneously caused fatalities to occur at a rate nearly comparable to driving the same distance, how horrifying!
Oh, the multi-billion dollar aircraft maker built a machine that crashes itself, gaslit it's regulators, pilots, airlines, and the flying public to juice the stock price so executives could meet their quarterly incentives, and diverted tunds away from it's QA and R&D functions to do stock buybacks, move HQ away from the factory floor, and try to union bust. With over 300 direct measurable deaths within a couple of months and multiple years worth of grounding and mandated redesigns to fix all the other cut corners we've been unearthing, and veritable billions of dollars of loss incurred in delays. Heavens, it could happen to anybody. How could you possibly see this as something to get upset about? /s
2 replies →
So what should we make of these issues described in the article? When, not if, this kind of thing kills people will it be a specification error? Will we blame it on maintenance? Surely it can't be the software's fault!
First of all, who got blamed for the 737 MAX? Boeing did. This is one of the few industries where the responsibility does not get easily sloughed off.
Second, 787s have been flying for ~13 years and ~4.5 million flights [1]. Assuming they were unaware of the problem for the majority of that time, their unknowing maintenance and usage processes avoided critical failures due to the stated problems for a tremendous number of flights. Given they now know about it and are issuing a directive to enhance their processes to explicitly handle the problem, we can assume it is even less likely to occur than previously which was already experimentally determined to be ludicrously unlikely. Suing someone into oblivion for a error that has never manifested as a serious failure and that is exceedingly unlikely to manifest is a little excessive.
Third, they should be remediating problems as they arise balanced against the risks introduced by specification changes and against the alternative of other process modifications. Given Boeing’s other recent failings, they should be given strict scrutiny that they are faithfully following the traditional, highly effective remediation processes. It should only be worrisome if they are seeing disproportionately more problems than would be expected in a aircraft design of its age and are not remediating problems robustly and promptly.
[1] https://www.boeing.com/commercial/787#overview
> Suing someone into oblivion for an error that has never manifested as a serious failure and that is exceedingly unlikely to manifest is a little excessive.
I appreciate your point of view. The air travel industry is undeniably safe, moreso than any transportation system ever. By a large margin. On the other hand, it is possible to make software systems that do not have the defects described in the article. So how do we get to the place where we choose to build systems that behave correctly? I don't think we get there without severe penalties for failure.
7 replies →
> First of all, who got blamed for the 737 MAX? Boeing did. This is one of the few industries where the responsibility does not get easily sloughed off.
The whistleblowers dying is coincidental and convenient.
https://www.theguardian.com/business/article/2024/may/02/sec...
1 reply →