Comment by d_burfoot

4 days ago

Wait a minute - the attackers were using the API to ask Claude for ways to run a cybercampaign, and it was only defeated because Anthropic was able to detect the malicious queries? What would have happened if they were using an open-source model running locally? Or a secret model built by the Chinese government?

I just updated by P(Doom) by a significant margin.

> What would have happened if they were using an open-source model running locally? Or a secret model built by the Chinese government?

In all likelihood, the exact same thing that is actually happening right now in this reality.

That said, local models specifically are perhaps more difficult to install given their huge storage and compute requirements.

If plain open-source local models were able to do what Claude API does, Anthropic would be out of business.

Local models are a different thing than those cloud-based assistants and APIs.

  • > If plain open-source local models were able to do what Claude API does, Anthropic would be out of business.

    Not necessarily. Oracle has made billions selling a database that's less good than plain open-source ones, for example.

Why would the increase be a significant margin? It's basically a security research tool, but with an agent in the loop that uses an LLM instead of another heuristic to decide what to try next.

I mean models exhibiting hacking behaviors has been predicted by cyberpunk for decades now, should be the first thing on any doom list.

Governments of course will have specially trained models on their corpus of unpublished hacks to be better at attacking than public models will.