Comment by Veserv

6 years ago

I agree, critical software should be written to a high quality standard. In fact, I will take it one step further and say that critical software must be written to an OBJECTIVE quality level that is sufficient for the problem at hand. If that level can not be reached, where we are confident that the risk is mitigated to the desired level, then the software should not be accepted no matter how hard they tried or how much they adhered to "best practices". We do not let people build bridges just because they tried hard. They have to demonstrate that it is safe, and, if nobody knows how to build a safe bridge in a given situation, then the bridge is NOT BUILT.

To then circle back to airplane software, the standards of original development are even higher than the standard I stated above. There are ~10,000,000 flights in the US per year according to the FAA and for at least the 10 years before the 737-MAX problems (I believe closer to 20), software was not been implicated in a single passenger air fatality. That means that in over 100,000,000 flights there were only two fatal crashes due to software (not even in the US, so we would actually need to include global flight data, but I will not bother with that since I am unaware of the count of software-related fatalities in other countries) for a total fleet-wide reliability of 1 in 50,000,000, 7 9s on a per-flight basis. If we use the per-time basis I used above, there are 25,000,000 flight-hours per year, so over 10 years, 6 minutes in 250,000,000 hours is 1 in 2,500,000,000 or 99.99999996% uptime, 9 9s, 25,000x gold standard server availability, 250,000x the availability guaranteed by the AWS SLA. Also note that with servers, we can use independent replicated servers to gain redundancy allowing uptime to multiply (1 in 100 failure for each server means chance of failure of both at the same time, assuming independence, is 1 in 10,000), but the same does not apply to airplanes since every airplane must succeed.

The thing to understand is that the software problems we are seeing are not necessarily an indication that their standards are lower than the prevailing software industry and that they should adopt their practices. It could be, and likely is, that the OBJECTIVE quality level we require is extremely high, and they have not been able to achieve it as of late. This obviously does not excuse their problems since, as I stated above, they must reach the OBJECTIVE quality level we require; it is just an observation that maybe it is not because they are incompetent, maybe they are really, really good and the problem is just really, really, really hard.