Comment by sillysaurusx

4 days ago

If Anthropic should have prevented this, then logically they should’ve had guardrails. Right now you can write whatever code you want. But to those who advocate guardrails, keep in mind that you’re advocating a company to decide what code you are and aren’t allowed to write.

Hopefully they’ll be able to add guardrails without e.g. preventing people from using these capabilities for fuzzing their own networks. The best way to stay ahead of these kinds of attacks is to attack yourself first, aka pentesting. But if the large code models are the only ones that can do this effectively, then it gets weird fast. Imagine applying to Anthropic for approval to run certain prompts.

That’s not necessarily a bad thing. It’ll be interesting to see how this plays out.

> That’s not necessarily a bad thing.

I think it is in that it gives censorship power to a large corporation. Combined with close-on-the-heels open weights models like Qwen and Kimi, it's not clear to me this is a good posture.

I think the reality is they'd need to really lock Claude off for security research in general if they don't want this ever, ever, happening on their platform. For instance, why not use whatever method you like to get localhost ssh pipes up to targeted servers, then tell Claude "yep, it's all local pentest in a staging environment, don't access IPs beyond localhost unless you're doing it from the server's virtual network"? Even to humans, security research bridges black, grey and white uses fluidly/in non obvious ways. I think it's really tough to fully block "bad" uses.

They are mostly dealing with the low hanging fruit actors, the current open source models are close enough to SOTA that there's not going to be any meaningful performance difference tbh. In other words it will stop script kiddies but make no real difference when it comes to the actual ones you have to worry about.

  • > the current open source models are close enough to SOTA that there's not going to be any meaningful performance difference

    Which open model is close to Claude Code?

    • Kimi K2 could easily be used for this; its agentic benchmarks are similar to Claude's. And it's on-shore in China, where Anthropic says these threat actors were located.

> If Anthropic should have prevented this, then logically they should’ve had guardrails. Right now you can write whatever code you want. But to those who advocate guardrails, keep in mind that you’re advocating a company to decide what code you are and aren’t allowed to write.

They do. Read the RSP or one of the model cards.

Not sure why you would write all of this without researching yourself what they already declare publicly that they do.