Comment by Kim_Bruning
3 hours ago
That's actually an AI-hard problem, if you think about it. The LLM can go off the tracks at any given point. The correct approach is to go at this from the inside out, baking reasoning about safe behaviour into your LLM at ever step. (Like Anthropic does)
No comments yet
Contribute on Hacker News ↗