Comment by vc289
14 hours ago
It's fundamentally impossible to stop an agent from performing a destructive action through instruction
Llms are just too creative. They will explore the search space of probable paths to get to their answer. There's no way you can patch all paths
We had to build isolation at the infra level (literally clone the DB) to make it safe enough otherwise there was no way we wouldn't randomly see the DB get deleted at some point
No comments yet
Contribute on Hacker News ↗