Ask HN: In agent/automation incidents, what slows recovery?

8 hours ago

I’m trying to understand a very specific moment in agent/automation incidents.

Something has already gone wrong. Logs exist. Dashboards exist. But recovery still stalls.

In your experience, what actually slowed things down at that point?

Was it unclear attribution (who caused what)? Unclear ownership (who should step in)? Or something else entirely?

Not selling anything — trying to learn from real oncall / incident experience.