Comment by neuroelectron

20 hours ago

So Claude will reject 9 out of 10 prompts I give it and lecture me about safety, but somehow it was used for something genuinely malicious?

Someone make this make sense.

7 comments

neuroelectron

goalieca 19 hours ago

LLMs are rather easy to convince. There’s no formal logic embedded in them that provably restricts outputs.

The less believable part for me is that people persist long enough and invest enough resources at prompting to do something with an automated agent that doesn’t have potential for massively backfire.

Secondly, they claimed to use Anthropic own infrastructure which is silly. There’s no doubt some capacity in China to do this. I also would expect incident response, threat detection teams, and other experts to be reporting this to Anthropic if Anthropic doesn’t detect it themselves first.

It sure makes good marketing to go out and claim such a thing though. This is exactly the kind of FOMO panic inducing headline that is driving the financing of whole LLM revolution.

apples_oranges 19 hours ago

there are llms which are modified to not reject anything at all, afaik this is possible with all llms. no need to convince.
(granted you have to have direct access to the llm, unlike claude where you just have the frontend, but the point stands. no need to convince whatsoever.)

comrade1234 19 hours ago

Stop talking dirty with Claude.

cbg0 19 hours ago

I've never had a prompt rejected by Claude. What kind of prompts are you sending where "9 out of 10" get rejected?

neuroelectron 17 hours ago

Basic system administration tasks, creating scripts for automating log scanning, service configuration, etc. often it involves PII or payment.
yahoozoo 6 hours ago

If you ask it to help you write a bot/cheat for a video game it will usually refuse due to breaking the games terms of service, etc.

danielbln 19 hours ago

I've rarely had Claude reject a prompt of mine. What are you prompting for to get a 90% refusal rate?