Comment by paulryanrogers

6 years ago

Aren't Ada and similar languages designed for safety critical cases like this?

When lives are on the line software should be tested for reliability beyond 51 days. Having to restart is a symptom of reckless disregard for safety IMO.

18 comments

paulryanrogers

tjr 6 years ago

When lives are on the line software should be tested for reliability beyond 51 days.

Avionics software is written a world of verifiable requirements.

For how many days should the software be required to operate?

Is it acceptable to add that many [more] days to the software verification schedule in order to verifiably demonstrate that it works according to requirements?

Why is 51 days not long enough?

paulryanrogers 6 years ago
Taking a plane from design to commercial delivery takes years. I'm sure they can spare 2-3 months to do some long running tests. Especially if those can run in parallel with other fit-and-finish work unrelated to software.
- tialaramex 6 years ago
  
  So, your idea is a plane comes with firmware when you buy it (say in 1985) and then that's the only version forever. Every problem between 1985 and now, too bad, this passed QA back in 1985 and we're not changing anything? No.
  Airliners are very long-lived equipment. So in fact they ship new releases. New releases have features that may be really valuable to safety, as well as features that are nice quality of life improvements. They're not shipping once per hour like a web startup, or even once per day like the NT internal team, but they do need to ship more than "once per new model of aircraft".
  I've written before about an accident I spent a bunch of time looking at. No fatalities, just a smashed runway light but still reportable because of the "But for..." rationale. Two of the easiest things that would have prevented that from occurring were firmware tweaks. One was a recommended (but not mandatory) change in a newer build and the other exists only in Airbus planes so far.
  Specifically the newer build does OAT disagree meaning if you tell the plane "It is -20°C outside" thus automatic takeoff thrust is much lower, the plane considers the temperature sensor at the engine inlet and it says to itself, this reads +15°C which is 35K different, that's the difference between flying and crashing into the fence at the end of the runway. I disagree with your guess about the temperature and so I refuse to try to figure out what to do next. You can realise you entered it wrong and type a more realistic value in, or you can set the thrust yourself manually if my sensors are broken.
  The fancier Airbus approach was not to focus on the result of air temperature calculations. If the plane isn't accelerating enough, it can't fly, we don't care why it isn't accelerating, maybe the wheels are square - we need to abort takeoff so we don't crash. So teach the plane how long runways are, it can use GPS to figure out which runway it's using, and then it can tell pilots if they aren't getting enough acceleration and they'll abort because they don't care why it's not enough acceleration either, they don't want to die in a fireball.
  
  2 replies →
- ethbro 6 years ago
  
  There are an infinite number of tests that could be performed.
  That this test could have been performed does not mean that all possible tests could have been performed.
  Which is really what we're talking about here.
  Is "able to run 51 days without reboot" a requirement, or not? If not, and it's not a use case, then it shouldn't have been tested for.
  Instead, the limited time and resources available should have been spent on more important things.
  
  2 replies →
- henvic 6 years ago
  
  1. You say like aircrafts are being manufactured by laymen. Despite all recent problems with Boeing, it's not the case. 2. Running a battery of formal proof tests is expensive and way more complicated than running a unit test suite for software. 3. Probably more complexity is required to solve this issue, and where is more complexity there, there might be more risk.
  I'm not saying that this is even acceptable or a great trade-off, but the way you worded your comment is presumptuous.
  
  1 reply →
- tjr 6 years ago
  
  Indeed, they do perform long-running tests. Is 51 days not long enough? How many days would be long enough?

goblin89 6 years ago

I’d be more alarmed about the fact that FAA had to issue a directive to deal with this situation. Either Boeing did not include the reboot in operation or maintenance procedures, or operators did not follow those procedures.

The requirement of a reboot on its own, though, would not strike me as a blatant disregard for safety, as long as the period between reboots is long enough to exceed the maximum possible length of flight (taking any emergencies into account) with leeway to spare.

munk-a 6 years ago
I wonder how many days before you hit 51 is the cutoff where planes need to be grounded - obviously a continuous 51 day flight can't happen without a lot of fancy stuff going on that doesn't apply to the 787, but let's say you're at 49.2 days and considering a .6 day flight, is that allowed or is .2 days within the expected probable variance of your flight time? What if it's closer?
- goblin89 6 years ago
  
  I believe a reboot in such circumstances belongs to the A check[0], i.e. each 25 days of flight time or more frequently than that.
  (Franky, I am surprised airplane software runs for days—somehow I assumed they get shut down completely every time they refuel.)
  [0] https://en.wikipedia.org/wiki/Aircraft_maintenance_checks#A_...

threeseed 6 years ago

> Having to restart is a symptom of reckless disregard for safety IMO

No it's a symptom of having bugs in your code.

And they can be there for a host of reasons ranging from "this is a once off accident" to "systematic failure in the software engineering process".

paulryanrogers 6 years ago
Bugs may be inevitable but reasons and outcomes matter. If the entertainment system goes down then no big deal. If the pilot is misinformed and the plane crashes then how is it possible for the company to get such slop certified?
- goblin89 6 years ago
  
  The pilots apparently start getting misinformed 51 days in software runtime. What if the certification process only tests for 42 days?