Comment by alexgarden
18 days ago
The truth is that the ship on "rules-based systems" has sailed. Doesn't matter if the vector is prompt injection, malicious payloads in skills, or backdoors - your agent (you will end up with one) is going to be exposed to judgment call moments on your behalf. Alignment and conscience (and an aligned conscience) are the only sustainable ways to solve this problem.
We're moving from "What am I not allowed to do" to "What's the right thing for me to do, considering the circumstances?"
Alignment is the foundation of trust.
No comments yet
Contribute on Hacker News ↗