Comment by olliej

6 years ago

First off, this is nonsense: the software was doing exactly what it was meant to (and designed to) do. Hardware engineers chose not to provide multiple sensors to validate AoA, hardware engineers did not provide a human-capable override. MCAS was designed to not be disabled by pilots, because doing so would make the plane a different aircraft according to the FAA.

Anyway I have yet to see a software related “certificate” that isn’t rote-learnable, comically high level, or both.

You also have to ask, what are you certifying?

All of these are fairly trivial to avoid in small programs:

* Use after free * time of check/time of use * out of bounds * numeric overflow

Especially in any kind of test environment where you are being extra careful.

Then there’s the language problem: many engineers have to use multiple languages, some only have to use “safe” languages. Should you require a different cert for each?

You’re also saying “not everyone gets to write software anymore” because the certification won’t be free.

How does open source then work? Clearly people working on the Linux kernel should be certified, so now you’re saying Linux should only accept patches from people who live in countries that can provide the required certs.

> Hardware engineers chose not to provide multiple sensors to validate AoA, hardware engineers did not provide a human-capable override. MCAS was designed to not be disabled by pilots, because doing so would make the plane a different aircraft according to the FAA.

I have two comments. One replace hardware engineers with 'management'

The second when I've read people talk about validating the AOC readings it makes me twitch a bit. Partly because my day job involves firmware that manages a self organizing sensors network. Validation of sensor data sounds easy until you force yourself to conceptualize what the system can know based on the actual data it sees and not your perceptions.

More there is a strong tendency to over focus on the ordinary case. And not all the edge cases. Very often dealing with edge cases is the fundamental problem. Consider designing the front end of a car. The primary design goal is actually 'passengers don't die when you drive it into a tree'

Problem with the MCAS system is it needs to work under all the edge cases, not just when the plane is flying in smooth air while the pilot is pulling the nose up. Like during a hard turn into wind-shear.

  • I don't mean validation == determine which one is correct, I mean "make sure they agree" and don't trust them otherwise, which, as far as I can tell, is how other Boeing systems work?

    I mean there's also the space shuttle system where you have N redundant systems controlling N separate motors (or whatever), and assume that if you'll never have >= N/2 producing incorrect output. That's a "no validation" approach that works by virtue of the correct instruments literally overpowering the incorrect ones.

    • Supposedly an Airbus plane had triple redundant sensors and two of them failed with the same reading and the good sensor was voted off the island.

      I'm walking away with the following explanation. Boeing made a breaking change to the aircraft and did such a good job hiding it from themselves, the PAA, and pilots that they made it impossible for experienced pilots to handle things when it failed.

      3 replies →

This isn't a certification problem. It's an engineering ethics problem.

The pass the buck circle jerk is how this design flaw came to exist. Everyone in the engineering organization needs to have the balls to point out systems design errors. Management needs to listen to them and not issue "make it work" marching orders. Regulators need to not delegate their responsibility to the previous.

More than one person could have put their foot down and demanded triple redundancy. That this didn't happen suggests even more safety concerns lurk in all of Boeing's products.

You also have to ask, what are you certifying?

Currently, the avionics software is certified, not the software engineer. The FAA-delegate safety reviewers get special training, but otherwise a bachelor's degree in a related discipline is the standard for an individual contributor's formal education.

There is arduous process in place to help ensure that commercial avionics software is produced to an acceptable level of quality. Problems can still get through, but the process helps weed out a lot of issues that you'd likely see in non-safety-critical software.

  • The original comment was saying a certificate for software engineering, and my response was in that context - what qualities of an individual engineer should be measured.

    • Right, I was not arguing, just sharing what is currently done in this field.

Nobody is saying that you can't be a programmer, or a dev, you just can't be a software engineer without a certification.

  • Why, so the software engineer can be a scapegoat if something goes wrong?

    Certify the software, not the person.

    • Stand up for your code and certify it like a mechanical or civil engineer certifies what they make or don’t wear the title of engineer. It’s time to take on all the qualities of the sobriquet not just the status and the salary that was appropriated.

      1 reply →

    • today, the plan is 'certify the process', not the product. people are assumed to be faulty.

  • Certifying design engineers is the old school dumb way of doing things. The better modern way is to certify the design process and to provide domain specific training to engineers involved.

  • You're right, you just can't get a job as a programmer. You also can't contribute to many major OSS projects.

    • Huh? Why can't you? It's the engineer's job at a company that deploys an OSS project to test and certify the tolerances, SLAs, and best practices (usage documents, checklists, etc.) for something they choose to deploy, whether or not that piece of software was written by engineers or plain old non-engineer developers.

      2 replies →

this was a systems engineering failure. nothing more. the system is designed to find these and remove them. it has not been determined if this is because of cost cutting or management pressure. could be, but it is also possible it is just an error made by people.

> Hardware engineers chose not to provide multiple sensors to validate AoA.

In effect you've just shifted the blame. Developers working at the lower levels could've pushed back on this harder if they were legally required to. My point is if mechanical and electronic engineers are liable then so should software guys - they need more power to say no.

> You also have to ask, what are you certifying?

An argument could be made that formal verification & ethics would be useful in this context.

> You’re also saying “not everyone gets to write software anymore” because the certification won’t be free.

Degrees aren't free either. Most developers aren't working in aerospace and won't need the rigour.

> How does open source then work?

I'm not talking about OSS. I'm talking about people who work with software that can kill people. If the Linux kernel is used as a technology in these machines then the software 'engineer' who made that decision is legally liable. The blame stops with them.

  • > In effect you've just shifted the blame.

    No. If the bug was in the software (say the bug was numeric underflow leading to crashing) it would be software. In this case the software engineers would have been told "here is your current AoA" and adjust the plane correctly in response. The hardware engineers/designers then provided them with unvalidated data, and I assume no details on the error rate (presumably because that would get the whole system flagged by the FAA as being nonsense)

    > Degrees aren't free either. Most developers aren't working in aerospace and won't need the rigour.

    "most" != all, literally my point. Also at what level does it kick in: OS developers? If they're using a licensed OS like QNX should all the QNX engineers need to be certified for avionics? How about linux?

    > I'm not talking about OSS

    So you're saying OSS shouldn't be used in commercial industry?

    If you work on linux: that's used in medical hardware, so it seems like all contributors should have your new Certificate in Not Killing People.

    But also, at what distance from killing people does this license cease being relevant? You worked on (say) a firewall product on some device, it fails to prevent some attack and the medical device kills someone.

    Or the radio stack?

    etc

    • > I assume no details on the error rate

      A perfect example of why the title engineer needs to be earned. This is a baseless assumption given that literally anything could go wrong. Sensors could become damaged, circuits broken, etc.. It is our job to plan for edge cases.

      > But also, at what distance from killing people does this license cease being relevant?

      The last link in the chain: The engineers who put their stamp of approval on the system being shipped to consumers (aka Boeing employees). If you're willing to risk human life on the fact the Linux kernel is acceptable for this task, then you should damn well be able to risk your job title.

      If Linux isn't up to the task then why is it being used?

      2 replies →