← Back to context

Comment by riskable

1 day ago

...and "the fix" that companies usually resort to is "use it or lose it" policies (e.g. you lose your role/permission after 30 days of non-use). So if you only do deployments for any given thing like twice a year, you end up having to submit a permissions request every single time.

No big deal, right? Until something breaks in production and now you have to wait for multiple approvals before you can even begin to troubleshoot. "I guess it'll have to stay down until tomorrow."

The way systems like this usually get implemented is there's an approval chain: First, your boss must approve the request and then the owner of the resource. Except that's only the most basic case. For production systems, you'll often have a much more complicated approval chain where your boss is just one of many individuals that need to approve such requests.

The end result is a (compounding) inefficiency that slows down everything.

Then there's AI: Management wants to automate as much as possible—which is a fine thing and entirely doable!—except you have this system where making changes requires approvals at many steps. So you actually can't "automate all the things" because the policy prevents it.

To add to that, the roles also need to be identified.

When some obscure thing breaks you either need to go on a quest to understand which are all the roles involved in fixing it, or send a much vaguer "let me do X and Y" request to the approval chain and have them figure it out on their end.

And as the approval agents aren't the ones fixing the issue, it's a back and forth of "can you do X?" "no, I'm locked at Y" "ok. then how about now ?"

Overprovisioning at least some key people is a fatality.