← Back to context

Comment by binarymax

1 day ago

What kinds of tasks does Opus refuse? I’m a light daily user for the past 3 months and Opus has never refused a task for me.

The later Opus models (4.7/4.8), Sonnet 5, and particularly Fable 5 will refuse to do tasks related to offensive security.

One example I've hit is working on a benchmark of how well LLMs handle Kubernetes security tasks, there's a section on them exploiting security misconfigurations. Opus 4.6 was fine with that section, 4.7 and 4.8 saw some refusals and Fable point blank refused to do any of it.

The only other model I've seen refuse is OpenAI GPT-5.5, all the open weight models seem fine with it.

Ofc if you need to do that kind of work a lot you might be able to get on OpenAI/Anthropics allow-list for cyber work.

One project I have deals with countries, and any time it touches code related to countries, it stops.

I've also had it refuse security-related tasks, and occasionally it stops without any discernible reason.

I’ve never had a refusal coding, and in some areas (AI red teaming specifically) I’ve found it quite good at recognizing and discussing “white hat” stuff that in the past I think would have got refusals.

But when there was the Hantavirus thing a while back, I asked it if there was a vaccine under development and got a refusal immediately. I’ve had a few like that. It seems they’ve implemented really poor guardrails on certain topics (CBRN and cyber) that have lots of false positives. But if you actually chat with the model itself it’s quite lucid about what is legitimately dangerous and what is just performative “AI Safety” style refusal.

  • Yeah, I’ve had Opus (and Fable) perform full security audits on my codebases that would run for 30mins. That’s what I think would have tripped it but went just fine.

    • I had it debug why Firefox crashed on my prototype X11 server and got a refusal when it started digging into what exact payload triggered the crash.

      But that's the only refusal I managed to get.

    • Try using it as an agent to perform black box security testing on a live instance of your codebase (assuming it's a hosted service).