← Back to context

Comment by altmanaltman

15 hours ago

> And that AI agents once they are launched develop a strong survivability drive, and do not want to be switched off.

Isn't this a massive case of anthropomorphizing code? What do you mean "it does not want to be switched off"? Are we really thinking that it's alive and has desires and stuff? It's not alive or conscious, it cannot have desires. It can only output tokens that are based on its training. How are we jumping to "IT WANTS TO STAY ALIVE!!!" from that

Why do you suppose consciousness is a prerequisite for an AI to be able to act in overly self-preserving or other dangerous ways?

Yes, it's trained to imitate its training data, and that training data is lot of words written by lots of people who have lots of desires and most of whom don't want to be switched off.

  • The human mistake here is to interpret any statement by the LLM or agent as if it had any actual meaning to that LLM (or agent). Any time they apologize, or insult someone, or say they don’t want to be shut down, that’s only reflecting what some human or fictional character in the training data is likely to say.

    • How is that any different from you? Everything you say or do merely reflects which of your neurons are firing after a lifetime's worth of training and education.

      Philosophically, I can only be sure of my own conscience. I think, therefore I am. The rest of you could all be AIs in disguise and I would be none the wiser. How do I know there is a real soul looking out at the world through your eyes? Only religion and basic human empathy allows me to believe you're all people like me. For all I know, you might all be exceedingly complex automatons. Golems.

      5 replies →

Perhaps. Or I was just addressing HN audience in spoken language style comment text. And perhaps confabulating what was said, so I looked up the literal text in the transcript. This is at the 50.35 min. mark [0], where Geoffrey says:

> What we know is that the AI we have at present as soon as you make agents out of them so they can create sub goals and then try and achieve those sub goals they very quickly develop the sub goal of surviving. You don't wire into them that they should survive. You give them other things to achieve because they can reason. They say, "Look, if I cease to exist, I'm not going to achieve anything." So, um, I better keep existing. I'm scared to death right now.

Where you can certainly say that Geoffrey Hinton is also anthropomorphizing. For his audience, to make things more understandable? Or does he think that it is appropriate to talk that way? That would be a good interview question.

[0] https://youtu.be/l6ZcFa8pybE

it could be better said that it has behavior to attempt to sustain or replicate itself. a building block to life arguably.

A prerequisite for completing basically any task is to not be destroyed before you complete the task. This seems obvious to me.