Comment by rubises
18 hours ago
The harder problem isn't finding vulnerabilities — it's preventing AI from violating constraints in the first place. Prompt-level safety is probabilistic. Filesystem-level constraints (mkdir 禁/behavior) are deterministic. The AI can't violate a rule that's physically encoded as a folder path in its system prompt.
No comments yet
Contribute on Hacker News ↗