Comment by neuroelectron
20 hours ago
So Claude will reject 9 out of 10 prompts I give it and lecture me about safety, but somehow it was used for something genuinely malicious?
Someone make this make sense.
20 hours ago
So Claude will reject 9 out of 10 prompts I give it and lecture me about safety, but somehow it was used for something genuinely malicious?
Someone make this make sense.
LLMs are rather easy to convince. There’s no formal logic embedded in them that provably restricts outputs.
The less believable part for me is that people persist long enough and invest enough resources at prompting to do something with an automated agent that doesn’t have potential for massively backfire.
Secondly, they claimed to use Anthropic own infrastructure which is silly. There’s no doubt some capacity in China to do this. I also would expect incident response, threat detection teams, and other experts to be reporting this to Anthropic if Anthropic doesn’t detect it themselves first.
It sure makes good marketing to go out and claim such a thing though. This is exactly the kind of FOMO panic inducing headline that is driving the financing of whole LLM revolution.
there are llms which are modified to not reject anything at all, afaik this is possible with all llms. no need to convince.
(granted you have to have direct access to the llm, unlike claude where you just have the frontend, but the point stands. no need to convince whatsoever.)
Stop talking dirty with Claude.
I've never had a prompt rejected by Claude. What kind of prompts are you sending where "9 out of 10" get rejected?
Basic system administration tasks, creating scripts for automating log scanning, service configuration, etc. often it involves PII or payment.
If you ask it to help you write a bot/cheat for a video game it will usually refuse due to breaking the games terms of service, etc.
I've rarely had Claude reject a prompt of mine. What are you prompting for to get a 90% refusal rate?