Comment by bostik

2 days ago

This was the bit that I spotted as potentially conflicting as well. Having managed (and sanitised!) tech&security policies at a small tech company, the fail-open vs. fail-closed decisions are rarely clear cut. What makes it worse is that a panicked C-suite member can make a blanket policy decision without consulting anyone outside their own circle.

The downstream effects tend to be pretty grim, and to make things worse, they start to show up only after 6 months. It's also a coinflip whether a reverse decision will be made after another major outage - itself directly attributable to the decisions made in the aftermath of the previous one.

What makes these kinds of issues particularly challenging is that by their very definition, the conditions and rules will be codified deep inside nested error handling paths. As an engineer maintaining these systems, you are outside of the battle tested happy paths and first-level unhappy paths. The conditions to end up in these second/third-level failure modes are not necessarily well understood, let alone reproducible at will. It's like writing code in C, and having all your multi-level error conditions be declared 'volatile' because they may be changed by an external force at any time, behind your back.

0 comments

bostik

No comments yet

Contribute on Hacker News ↗