Comment by devonkelley

1 month ago

The prompt injection concerns are valid, but I think there's a more fundamental issue: agents are non-deterministic systems that fail in ways that are hard to predict or debug.

Security is one failure mode. But "agent did something subtly wrong that didn't trigger any errors" is another. And unlike a hacked system where you notice something's off, a flaky agent just... occasionally does the wrong thing. Sometimes it works. Sometimes it doesn't. Figuring out which case you're in requires building the same observability infrastructure you'd use for any unreliable distributed system.

The people running these connected to their email or filesystem aren't just accepting prompt injection risk. They're accepting that their system will randomly succeed or fail at tasks depending on model performance that day, and they may not notice the failures until later.

1 comment

devonkelley

ssvora 25 days ago

How are these agents stress tested today? Are there tools that are typically being used for QA and/or security?