← Back to context

Comment by andy99

1 day ago

I’ve never had a refusal coding, and in some areas (AI red teaming specifically) I’ve found it quite good at recognizing and discussing “white hat” stuff that in the past I think would have got refusals.

But when there was the Hantavirus thing a while back, I asked it if there was a vaccine under development and got a refusal immediately. I’ve had a few like that. It seems they’ve implemented really poor guardrails on certain topics (CBRN and cyber) that have lots of false positives. But if you actually chat with the model itself it’s quite lucid about what is legitimately dangerous and what is just performative “AI Safety” style refusal.

Yeah, I’ve had Opus (and Fable) perform full security audits on my codebases that would run for 30mins. That’s what I think would have tripped it but went just fine.

  • I had it debug why Firefox crashed on my prototype X11 server and got a refusal when it started digging into what exact payload triggered the crash.

    But that's the only refusal I managed to get.

  • Try using it as an agent to perform black box security testing on a live instance of your codebase (assuming it's a hosted service).