← Back to context

Comment by EMM_386

17 hours ago

Anthropic is not a security vendor.

They're an AI research company that detected misuse of their own product. This is like "Microsoft detected people using Excel macros for malware delivery" not "Mandiant publishes APT28 threat intelligence". They aren't trying to help SOCs detect this specific campaign. It's warning an entire industry about a new attack modality.

What would the IoCs even be? "Malicious Claude Code API keys"?

The intended audience is more like - AI safety researchers, policy makers, other AI companies, the broader security community understanding capability shifts, etc.

It seems the author pattern-matched "threat intelligence report" and was bothered that it didn't fit their narrow template.

If Anthropic is not a security vendor, then they should not make statements like "we detected a highly sophisticated cyber espionage operation conducted by a Chinese state-sponsored" or "represents a fundamental shift in how advanced threat actors use AI" and let the security vendors do that.

If the report can be summed up as "they detected misuse of their own product" as you say, then that's closer to a nothingburger, than to the big words they are throwing around.

  • That makes no sense. Just because they aren't a security vendor doesn't mean they don't have useful information to share. Nor does it mean they shouldn't share it. They aren't pretending to be a security researcher, vendor, or anything else than AI researchers. They reported on findings on how their product is getting used.

    Anyone acting like they are trying to be anything else is saying more about themselves than they are about Anthropic.

Yep, agree with your assessment. As someone working in security I found the report useful as a warning of the new types of attack we will likely face.

> What would the IoCs even be?

Prompts.

  • The prompts aren't the key to the attack, though. They were able to get around guardrails with task decomposition.

    There is no way for the AI system to verify whether you are white hat or black hat when you are doing pen-testing if the only task is to pen-test. Since this is not part of a "broader attack" (in the context), there is no "threat".

    I don't see how this can be avoided, given that there are legitime uses to every step of this in creating defenses to novel attacks.

    Yes, all of this can be done with code and humans as well - but it is the scale and the speed that becomes problematic. It can adjust in real-time to individual targets and does not need as much human intervention / tailoring.

    Is this obvious? Yes - but it seems they are trying to raise awareness of an actual use of this in the wild and get people discussing it.

    • I agree that there will be no single call or inference that presents malice. But I feel like they could still share general patterns of orchestration (latencies, concurrencies, general cadences and parallelization of attacks, prompts used to granulaize work, whether prompts themselves have been generated in previous calls to Claude). There's a bunch of more specific telltales they could have alluded to. I think it's likely they're being obscure because they don't want to empower bad actors, but that's not really how the cybersecurity industry likes to operates. Maybe Anthropic believes this entire AI thing is a brand new security regime and so believe existing resiliences are moot. That we should all follow blindly as they lead the fight. Their narrative is confusing. Are they being actually transparent or transparency-"coded"?