Comment by Bratmon

6 hours ago

> It’s not because of capability, it’s because Anthropic’s guardrails prevented it from solving the problem.

I'm not familiar with this case, but in general people should be very suspicious about this claim- it is extremely common for an LLM to claim they're not allowed to do something when in fact they're incapable of it.

After all "My code of conduct forbids me from..." is a completion just like any other, and if the LLM can't perform a task, it's usually the best completion.

2 comments

Bratmon

SOLAR_FIELDS 31 minutes ago

My anecdata from my example demonstrates it’s not the case. I hit the security guardrail, then start a new prompt, asking it to do literally the exact same thing in a different way and without the lead up context, and it happily does it

gck1 4 hours ago

No. Anthropic runs prompts through a classifier that then proceeds to do prompt injection on anything dual-use, which then results in an escalating flag on your account, which increases the strictness of the classifier and volume of prompt injections progressively.