Engineering a safer world - Systems thinking applied to thinking, by Nancy Leveson (MIT - 2009) was recommended in a previous discussion as a more comprehensive and systemic treatment:
http://sunnyday.mit.edu/safer-world.pdf
Point 2 is "Complex systems are heavily and successfully defended against failure"
Complex systems do fail. But airplanes are still extremely safe. Because people stacked on even more complex systems, often involving worldwide change in response to an accident that happened once.
You constantly hear about how safe it is to fly. And yet hardly anyone seems to learn fron their successes. When you stop accepting failure and are willing to disrupt everything if it saves even one life, you can do a lot.
Complex systems may be unreliable, but with enough work, it seems we can sometimes make the overall picture safer than not having them.
I can't firmware update all of mankind to never leave a baby in a hot car. But they can put sensors on seats and continually do studies to be sure it's working. Complex systems are sometimes more controllable than people or simple systems.
The choice sometimes seems to be "Add complexity, do nothing, or do something that nobody will accept"
I really see your point here; but I have to caution: Airplanes are "exactly as simple" as they need to be.
There is a lot that goes into their design to simplify things greatly; you're probably thinking of complicated computer systems that are used in planes.
But those computer systems are incredibly simple compared to what we use or build atop of: as simple as they have to be in order to be fully understood.
They're still far more complicated than a layman might guess after years of hearing that simple is always better.
I'm guessing things like fly by wire would be automatically assumed to be unsafe by a lot of people.
When cars get features that are anything like what goes into planes, people tend to get upset and say "I'm not an idiot, you should make cars expensive and complicated just because some drivers can't stop crashing without a computer".
> are willing to disrupt everything if it saves even one life
I feel like you're walking away with the wrong lesson. Disrupting everything is a great way to blow up complex systems. You want to change things gradually, ensuring that the human side can keep up.
A lot of the time what happens is the human side doesn't need to keep up.
They'll say "This actuator fails if driven past it's limit in hot weather after a rainstorm, and we have data showing that people can overdrive it accidentally in this condition".
Then they'll replace all affected actuators even if it costs millions .
Or they'll add a software patch to keep you from overdriving it.
What they don't do is say "It's probably fine, people just need to be more careful". If someone made a mistake once, someone else can make it again. Systems have to be built for the people who will actually use them, not theoretical elite users.
On occasion the technical fix has it's own dangers that need to be evaluated and you can't find any substitute for operators doing the right thing(See Gare de Lyon for the perfect example of multiple human errors by different people interacting with complex safety systems).
But only some careful analysis will tell you what's more dangerous.
"I can't firmware update all of mankind to never leave a baby in a hot car. But they can put sensors on seats and continually do studies to be sure it's working."
We can also put mandatory sensors in peoples bodies, to make sure they act and live allright.
But I think this would be overcomplicating things.
Complexity wouldn't be the problem, the issue would be violation of people's bodies.
Something like a car seat sensor is just a consumer product safety regulation that does not imply any extreme expense, danger, or violation, except to very extreme anti-tech or anti-regulation people. It's a further development of the same trend as headlight or seatbelt related laws.
Plus, it protects people who have no way of protecting themselves, from mistakes that are made by people who have actively been prevented all their life (via confidence culture) from having the tools to prevent making them.
Numbers 1, 4-8, and 11-18, are all Truisms. The rest are not:
"2. Complex systems are heavily and successfully defended against failure"
Many complex systems are weakly defended, sometimes not at all. Sometimes the defense is accidental or incidental. Sometimes they are heavily yet unsuccessfully defended. Never attribute to defense that which can be attributed to purely random chance, ignorance, convenience, and avoidance of responsibility.
"3. Catastrophe requires multiple failures – single point failures are not enough."
Catastrophe definitely can and does happen from single points of failure. It's just that in highly defended systems, multiple failures are common.
"9. Human operators have dual roles: as producers & as defenders against failure."
These can be distinct roles, but in practice that requires extra money, staffing, etc which makes it rare. However, there are systems in which defense becomes its own role, often because the producers suck at it or don't want to do it, or are just really busy.
"10. All practitioner actions are gambles."
On the fence about this one. I would say all practitioner changes are gambles. A practitioner looking at a pressure gauge dial is an action, but it isn't a gamble. Unless the gauge needle sticks, and reading it was a critical action... I suppose you could say all actions are gambles, and changes are much more risky gambles, and non-change actions are likely to be seen as non-risky.
I highly recommend the book Normal Accidents by Charles Perrow. Perrow argues that multiple and unexpected failures are built into society's complex and tightly coupled systems
Management of complex systems is never a done deal, so there is always the possibility you missed some tiny gap in your process that can still take you out entirely.
A good example of this being the Texas grid in 2021.
For which there had been numerous warnings for years by industry observers.
That system does not draw upon outside of network utilities, to avoid Federal regulation, hence has limited reserve resources. And the Texas system did not pay providers to have standby reserves. Thus a fragile system, easy to be subject to failure.
Edit:
Texas Was Warned a Decade Ago Its Grid Was Unready for Cold
(Bloomberg)
the thing is that we all see this system as a failure,
but the people who make the decisions, and who run that system, see it as a success.
their primary goal is not to provide reliable power, their primary goal is profit and ideology. and they have successfully done both those things.
a bunch of people died, nothing will change, and they will face no consequences. texas will remain off the grid and the next winter storm the same thing will happen.
in their book, they are a success.
thats the whole thing about complex systems. at some point, human beings disagree on what the priorities are, so they disagree on what failure is, they disagree on what maintenance is, and they disagree on what "proper function" is.
complex systems are always connected to complex vested interests and flows of power and money.
so people say things like "the boeing 787 failed". . . did it though? It killed hundres of people , but the executives in charge of it made huge profits and faced zero consequences. Boeing stock price is fine, and it will not face any meaningful punishment or consequences from the government. nor will it face any meaningful consequences from the legal system, which is irrevocably twisted in favor of big corporations like them.
From these peoples perspective, the 787 killing hundreds of people is not a failure, its just something that happened that they can hire PR people to deal with. it wont interrupt cash flow (or, at least it wont interrupt their personal bonuses and personal wealth that much) so its basically irrelevant to them.
It's not intended to be rigorous. The context here is that Richard I. Cook, one of the main figures in safety and resilience engineering, who's published many, many papers on these topics died recently. The "How Complex Systems Fail" paper is intended to be a bit pithy and light; more an attempt at summarizing years of wisdom. See: https://www.adaptivecapacitylabs.com/blog/2022/09/12/richard...
> Catastrophe requires multiple failures – single point failures are not enough
My experience is that a single failure causes a cascade of subsequent failures. This topic is very interesting, but this post is more of a teaser of topics than a real explanation.
I always thought that late Paul Ciliers' did a great summary on complexity (sorry no online link):
"Complexity in a Nutshell:
I will not provide a detailed description of complexity here, but only summarise the general characteristics of complex systems as I see them.
-Complex systems consist of a large number of elements that in themselves can be simple.
- The elements interact dynamically by exchanging energy or information. These interactions are rich. Even if specific elements only interact with a few others, the effects of these interactions are propagated throughout the system. The interactions are nonlinear.
- There are many direct and indirect feedback loops.
- Complex systems are open systems – they exchange energy or information with their environment – and operate at conditions far from equilibrium.
Complex systems have memory, not located at a specific place, but distributed throughout the system. Any complex system thus has a history, and the history is of cardinal importance to the behaviour of the system.
- The behaviour of the system is determined by the nature of the interactions, not by what is contained within the components. Since the interactions are rich, dynamic, fed back, and, above all, nonlinear, the behaviour of the system as a whole cannot be predicted from an inspection of its components. The notion of emergence is used to describe this aspect. The presence of emergent properties does not provide an argument against causality, only against deterministic forms of prediction.
- Complex systems are adaptive. They can (re)organise their internal structure without the intervention of an external agent.
Certain systems may display some of these characteristics more prominently than others. These characteristics are not offered as a definition of complexity, but rather as a general, low-level, qualitative description. If we accept this description (which from the literature on complexity theory appears to be reasonable), we can investigate the implications it would have for social or organisational systems."
Ciliers, P. (2016). Critical Complexity Collected Essays, Walter de Gruyter GmbH. 67
Also if you look up any Dave Snowden's video on YT you'll find plenty of useful info.
Engineering a safer world - Systems thinking applied to thinking, by Nancy Leveson (MIT - 2009) was recommended in a previous discussion as a more comprehensive and systemic treatment: http://sunnyday.mit.edu/safer-world.pdf
STAMP ("System-Theoretic Accident Model and Processes") is reviewed here: https://www.sciencedirect.com/science/article/abs/pii/S09504...
And there is a course (lecture notes look great): https://ocw.mit.edu/courses/16-63j-system-safety-spring-2016...
The real URL is how.complexsystems.fail
Point 2 is "Complex systems are heavily and successfully defended against failure"
Complex systems do fail. But airplanes are still extremely safe. Because people stacked on even more complex systems, often involving worldwide change in response to an accident that happened once.
You constantly hear about how safe it is to fly. And yet hardly anyone seems to learn fron their successes. When you stop accepting failure and are willing to disrupt everything if it saves even one life, you can do a lot.
Complex systems may be unreliable, but with enough work, it seems we can sometimes make the overall picture safer than not having them.
I can't firmware update all of mankind to never leave a baby in a hot car. But they can put sensors on seats and continually do studies to be sure it's working. Complex systems are sometimes more controllable than people or simple systems.
The choice sometimes seems to be "Add complexity, do nothing, or do something that nobody will accept"
I really see your point here; but I have to caution: Airplanes are "exactly as simple" as they need to be.
There is a lot that goes into their design to simplify things greatly; you're probably thinking of complicated computer systems that are used in planes.
But those computer systems are incredibly simple compared to what we use or build atop of: as simple as they have to be in order to be fully understood.
They're still far more complicated than a layman might guess after years of hearing that simple is always better.
I'm guessing things like fly by wire would be automatically assumed to be unsafe by a lot of people.
When cars get features that are anything like what goes into planes, people tend to get upset and say "I'm not an idiot, you should make cars expensive and complicated just because some drivers can't stop crashing without a computer".
> are willing to disrupt everything if it saves even one life
I feel like you're walking away with the wrong lesson. Disrupting everything is a great way to blow up complex systems. You want to change things gradually, ensuring that the human side can keep up.
A lot of the time what happens is the human side doesn't need to keep up.
They'll say "This actuator fails if driven past it's limit in hot weather after a rainstorm, and we have data showing that people can overdrive it accidentally in this condition".
Then they'll replace all affected actuators even if it costs millions .
Or they'll add a software patch to keep you from overdriving it.
What they don't do is say "It's probably fine, people just need to be more careful". If someone made a mistake once, someone else can make it again. Systems have to be built for the people who will actually use them, not theoretical elite users.
On occasion the technical fix has it's own dangers that need to be evaluated and you can't find any substitute for operators doing the right thing(See Gare de Lyon for the perfect example of multiple human errors by different people interacting with complex safety systems).
But only some careful analysis will tell you what's more dangerous.
"I can't firmware update all of mankind to never leave a baby in a hot car. But they can put sensors on seats and continually do studies to be sure it's working."
We can also put mandatory sensors in peoples bodies, to make sure they act and live allright.
But I think this would be overcomplicating things.
Complexity wouldn't be the problem, the issue would be violation of people's bodies.
Something like a car seat sensor is just a consumer product safety regulation that does not imply any extreme expense, danger, or violation, except to very extreme anti-tech or anti-regulation people. It's a further development of the same trend as headlight or seatbelt related laws.
Plus, it protects people who have no way of protecting themselves, from mistakes that are made by people who have actively been prevented all their life (via confidence culture) from having the tools to prevent making them.
2 replies →
Numbers 1, 4-8, and 11-18, are all Truisms. The rest are not:
"2. Complex systems are heavily and successfully defended against failure"
Many complex systems are weakly defended, sometimes not at all. Sometimes the defense is accidental or incidental. Sometimes they are heavily yet unsuccessfully defended. Never attribute to defense that which can be attributed to purely random chance, ignorance, convenience, and avoidance of responsibility.
"3. Catastrophe requires multiple failures – single point failures are not enough."
Catastrophe definitely can and does happen from single points of failure. It's just that in highly defended systems, multiple failures are common.
"9. Human operators have dual roles: as producers & as defenders against failure."
These can be distinct roles, but in practice that requires extra money, staffing, etc which makes it rare. However, there are systems in which defense becomes its own role, often because the producers suck at it or don't want to do it, or are just really busy.
"10. All practitioner actions are gambles."
On the fence about this one. I would say all practitioner changes are gambles. A practitioner looking at a pressure gauge dial is an action, but it isn't a gamble. Unless the gauge needle sticks, and reading it was a critical action... I suppose you could say all actions are gambles, and changes are much more risky gambles, and non-change actions are likely to be seen as non-risky.
The Career, Accomplishments, and Impact of Richard I. Cook: A Life in Many Acts (September 12, 2022) https://www.adaptivecapacitylabs.com/blog/2022/09/12/richard...
2020, 84 comments https://news.ycombinator.com/item?id=8282923
Thanks! Macroexpanded:
How Complex Systems Fail (1998) - https://news.ycombinator.com/item?id=926735 - Nov 2009 (1 comment)
So it looks like 2020 is still in the lead...
1 reply →
I highly recommend the book Normal Accidents by Charles Perrow. Perrow argues that multiple and unexpected failures are built into society's complex and tightly coupled systems
The root cause thing I would push back on.
Management of complex systems is never a done deal, so there is always the possibility you missed some tiny gap in your process that can still take you out entirely.
A good example of this being the Texas grid in 2021.
Not a tiny gap at all.
For which there had been numerous warnings for years by industry observers.
That system does not draw upon outside of network utilities, to avoid Federal regulation, hence has limited reserve resources. And the Texas system did not pay providers to have standby reserves. Thus a fragile system, easy to be subject to failure.
Edit:
Texas Was Warned a Decade Ago Its Grid Was Unready for Cold (Bloomberg)
https://www.bloomberg.com/news/articles/2021-02-17/texas-was...
the thing is that we all see this system as a failure,
but the people who make the decisions, and who run that system, see it as a success.
their primary goal is not to provide reliable power, their primary goal is profit and ideology. and they have successfully done both those things.
a bunch of people died, nothing will change, and they will face no consequences. texas will remain off the grid and the next winter storm the same thing will happen.
in their book, they are a success.
thats the whole thing about complex systems. at some point, human beings disagree on what the priorities are, so they disagree on what failure is, they disagree on what maintenance is, and they disagree on what "proper function" is.
complex systems are always connected to complex vested interests and flows of power and money.
so people say things like "the boeing 787 failed". . . did it though? It killed hundres of people , but the executives in charge of it made huge profits and faced zero consequences. Boeing stock price is fine, and it will not face any meaningful punishment or consequences from the government. nor will it face any meaningful consequences from the legal system, which is irrevocably twisted in favor of big corporations like them.
From these peoples perspective, the 787 killing hundreds of people is not a failure, its just something that happened that they can hire PR people to deal with. it wont interrupt cash flow (or, at least it wont interrupt their personal bonuses and personal wealth that much) so its basically irrelevant to them.
1 reply →
I believe the point is to not look for a single root cause.
Yes, but that tiny gap is no more the root cause than any other one of many other decisions that preceded the catastrophe.
s/^/How /
Simple systems also fails!
How complex systems fail:
Like most things. Slowly at first, then all at once.
why read about it when we can experience it by being alive today?
to better understand the times, of course...
This does not seem very rigorous. Can someone point to a better coverage of this topic?
It's not intended to be rigorous. The context here is that Richard I. Cook, one of the main figures in safety and resilience engineering, who's published many, many papers on these topics died recently. The "How Complex Systems Fail" paper is intended to be a bit pithy and light; more an attempt at summarizing years of wisdom. See: https://www.adaptivecapacitylabs.com/blog/2022/09/12/richard...
Well, this sounds wrong to me:
> Catastrophe requires multiple failures – single point failures are not enough
My experience is that a single failure causes a cascade of subsequent failures. This topic is very interesting, but this post is more of a teaser of topics than a real explanation.
2 replies →
I agree.
I always thought that late Paul Ciliers' did a great summary on complexity (sorry no online link):
"Complexity in a Nutshell:
I will not provide a detailed description of complexity here, but only summarise the general characteristics of complex systems as I see them.
-Complex systems consist of a large number of elements that in themselves can be simple.
- The elements interact dynamically by exchanging energy or information. These interactions are rich. Even if specific elements only interact with a few others, the effects of these interactions are propagated throughout the system. The interactions are nonlinear.
- There are many direct and indirect feedback loops.
- Complex systems are open systems – they exchange energy or information with their environment – and operate at conditions far from equilibrium. Complex systems have memory, not located at a specific place, but distributed throughout the system. Any complex system thus has a history, and the history is of cardinal importance to the behaviour of the system.
- The behaviour of the system is determined by the nature of the interactions, not by what is contained within the components. Since the interactions are rich, dynamic, fed back, and, above all, nonlinear, the behaviour of the system as a whole cannot be predicted from an inspection of its components. The notion of emergence is used to describe this aspect. The presence of emergent properties does not provide an argument against causality, only against deterministic forms of prediction.
- Complex systems are adaptive. They can (re)organise their internal structure without the intervention of an external agent.
Certain systems may display some of these characteristics more prominently than others. These characteristics are not offered as a definition of complexity, but rather as a general, low-level, qualitative description. If we accept this description (which from the literature on complexity theory appears to be reasonable), we can investigate the implications it would have for social or organisational systems."
Ciliers, P. (2016). Critical Complexity Collected Essays, Walter de Gruyter GmbH. 67
Also if you look up any Dave Snowden's video on YT you'll find plenty of useful info.