← Back to context

Comment by vjvjvjvjghv

7 hours ago

I am pretty sure if they invested just a small fraction of the hundreds of billions data center dollars, they could detect that the conversation is going off the rails and stop it.

That's actually an AI-hard problem, if you think about it. The LLM can go off the tracks at any given point. The correct approach is to go at this from the inside out, baking reasoning about safe behaviour into your LLM at ever step. (Like Anthropic does)