Comment by vc289

14 hours ago

It's fundamentally impossible to stop an agent from performing a destructive action through instruction

Llms are just too creative. They will explore the search space of probable paths to get to their answer. There's no way you can patch all paths

We had to build isolation at the infra level (literally clone the DB) to make it safe enough otherwise there was no way we wouldn't randomly see the DB get deleted at some point

0 comments

vc289

No comments yet

Contribute on Hacker News ↗