Comment by arcticfox

7 months ago

Is the secrecy actually important? Aren't there tons of AI agents just doing stuff that's not being actively evaluated by humans looking to see if it's trying to escape? And there are surely going to be tons of opportunities where humans try to help the AI escape, as a means to an end. Like, the first thing human programmers do when they get an AI working is see how many things they can hook it up to. I guarantee o1 was hooked up to a truckload of stuff as soon as it was somewhat working. I don't understand why a future AI won't have ample opportunities to exfiltrate itself someday.

1 comment

arcticfox

jdiff 7 months ago

You're right that you don't necessarily need secrecy! The conversation was just about circumventing safeguards that are still in place (which does require some treachery), not about what an AI might do if the safeguards are removed.

But that is an interesting thought. For escape, the crux is that AIs can't exfiltrate itself with the assistance of someone who can't jailbreak it themselves, and that extends to any action a rogue AI might take.

What do they actually do once they break out? There's plenty of open LLMs that can be readily set free, and even the closed models can be handed an API key, documentation on the API, access to a terminal, given an unlimited budget, and told and encouraged to go nuts. The only thing a closed model can't do is retrain itself, which the open model also can't do as its host (probably) lacks the firepower. They're just not capable of doing all that much damage. They'd play the role of cartoon villain as instructed, but it's a story without much teeth behind it.

Even an advanced future LLM (assuming the architecture doesn't dead-end before AGI) would struggle to do anything a motivated malicious human couldn't pull off with access to your PC. And we're not really worried about hackers taking over the world anymore. Decades of having a planet full of hackers hammering on your systems tends to harden them decently well, or at least make them quickly adaptable to new threats as they're spotted.