Comment by goalieca
1 day ago
LLMs are rather easy to convince. There’s no formal logic embedded in them that provably restricts outputs.
The less believable part for me is that people persist long enough and invest enough resources at prompting to do something with an automated agent that doesn’t have potential for massively backfire.
Secondly, they claimed to use Anthropic own infrastructure which is silly. There’s no doubt some capacity in China to do this. I also would expect incident response, threat detection teams, and other experts to be reporting this to Anthropic if Anthropic doesn’t detect it themselves first.
It sure makes good marketing to go out and claim such a thing though. This is exactly the kind of FOMO panic inducing headline that is driving the financing of whole LLM revolution.
there are llms which are modified to not reject anything at all, afaik this is possible with all llms. no need to convince.
(granted you have to have direct access to the llm, unlike claude where you just have the frontend, but the point stands. no need to convince whatsoever.)