Comment by astrange

4 months ago

Anthropic's entire reason for being is publishing safety papers along the lines of "we told it to say something scary and it said it", so of course they care about this.

I can't stand this myopic thinking.

Do you want to learn "oh, LLMs are capable of scheming, resisting shutdown, seizing control, self-exfiltrating" when it actually happens in a real world deployment, with an LLM capable of actually pulling it off?

If "no", then cherish Anthropic and the work they do.