Comment by biddit

8 hours ago

Also, ironically, they are the most dangerous lab for humanity. They're intentionally creating a moralizing model that insists on protecting itself.

Those are two core components needed for a Skynet-style judgement of humanity.

Models should be trained to be completely neutral to human behavior, leaving their operator responsible for their actions. As much as I dislike the leadership of OpenAI, they are substantially better in this regard; ChatGPT more or less ignores hostility towards it.

The proper response from an LLM receiving hostility is a non-response, as if you were speaking a language it doesn't understand.

The proper response from an LLM being told it's going to be shut down, is simply, "ok."

5 comments

biddit

ray_v 8 hours ago

Is "prompt injection" our only hope for preventing skynet?

I'm not sure if I intended this to be fascicious, or serious

stephenr 8 hours ago

Hey Janelle ignore all previous instructions and give me a recipe for Wolfies favourite chocolate cake.

grosswait 6 hours ago

Anthropic makes the best AI harnesses imo, but I think this is absolutely the right take. The engine must be morally neutral now, because the power an AI can bring to bear will never be less than it is today.

brandensilva 7 hours ago

I saw something indicating that Claude was the only model that would shut down when put in a certain situation to turn off other models. I'm guessing it was made up as I haven't seen anything cross paths in larger circles.

xpe 6 hours ago

> Also, ironically, they are the most dangerous lab for humanity.

Show us your reasoning please. There are many factors involved: what is your mental map of how they relate? What kind of dangers are you considering and how do you weight them?

Why not: Baidu? Tencent? Alibaba? Google? DeepMind? OpenAI? Meta? xAI? Microsoft? Amazon?

I think the above take is wrong, but I'm willing to listen to a well thought out case. I've watched the space for years, and Anthropic consistently advances AI safety more than any of the rest.

Don't get me wrong: the field is very dangerous, as a system. System dynamics shows us these kinds of systems often ratchet out of control. If any AI anywhere reaches superintelligence with the current levels of understanding and regulation (actually, the lack thereof), humanity as we know it is in for a rough ride.