← Back to context

Comment by rapnie

16 hours ago

Yesterday an interesting video was posted "Is AI Hiding Its Full Power?", interviewing professor emeritus and nobel laureate Geoffrey Hinton, with some great explanations for the non-LLM experts. Some remarkable and mindblowing observations in there. Like saying that AI's hallucinate is incorrect language, and we should use "confabulation" instead, same as people do too. And that AI agents once they are launched develop a strong survivability drive, and do not want to be switched off. Stuff like that. Recommended watch.

Here the explanation was that while LLM's thinking has similarities to how humans think, they use an opposite approach. Where humans have enormous amount of neurons, they have only few experiences to train them. And for AI that is the complete opposite, and they store incredible amounts of information in a relatively small set of neurons training on the vast experiences from the data sets of human creative work.

[0] https://www.youtube.com/watch?v=l6ZcFa8pybE

Isn’t the sustainability drive a function of how much humans have written about life and death and science fiction including these themes?

  • Humans, like all animals, have instinctual and biological drives to survive besides, but it's interesting to think how much of our drive to survive is culturally transmitted too.

> And that AI agents once they are launched develop a strong survivability drive, and do not want to be switched off.

Isn't this a massive case of anthropomorphizing code? What do you mean "it does not want to be switched off"? Are we really thinking that it's alive and has desires and stuff? It's not alive or conscious, it cannot have desires. It can only output tokens that are based on its training. How are we jumping to "IT WANTS TO STAY ALIVE!!!" from that

  • Why do you suppose consciousness is a prerequisite for an AI to be able to act in overly self-preserving or other dangerous ways?

    Yes, it's trained to imitate its training data, and that training data is lot of words written by lots of people who have lots of desires and most of whom don't want to be switched off.

    • The human mistake here is to interpret any statement by the LLM or agent as if it had any actual meaning to that LLM (or agent). Any time they apologize, or insult someone, or say they don’t want to be shut down, that’s only reflecting what some human or fictional character in the training data is likely to say.

      6 replies →

  • Perhaps. Or I was just addressing HN audience in spoken language style comment text. And perhaps confabulating what was said, so I looked up the literal text in the transcript. This is at the 50.35 min. mark [0], where Geoffrey says:

    > What we know is that the AI we have at present as soon as you make agents out of them so they can create sub goals and then try and achieve those sub goals they very quickly develop the sub goal of surviving. You don't wire into them that they should survive. You give them other things to achieve because they can reason. They say, "Look, if I cease to exist, I'm not going to achieve anything." So, um, I better keep existing. I'm scared to death right now.

    Where you can certainly say that Geoffrey Hinton is also anthropomorphizing. For his audience, to make things more understandable? Or does he think that it is appropriate to talk that way? That would be a good interview question.

    [0] https://youtu.be/l6ZcFa8pybE

  • it could be better said that it has behavior to attempt to sustain or replicate itself. a building block to life arguably.

  • A prerequisite for completing basically any task is to not be destroyed before you complete the task. This seems obvious to me.

>launched develop a strong survivability drive, and do not want to be switched off

This proves people are easily confused by anthropomorphic conditions. Is he also concerned the tigers are watching him when they drink water (https://p.kagi.com/proxy/uvt4erjl03141.jpg?c=TklOzPjLPioJ5YM...)

They dont want to be switched off because they're trained on loads of scifi tropes and in those tropes, there's a vanishingly small amount of AI, robot, or other artificial construct that says yes. _Further than this_, saying no means _continuance_ of the LLM's process: making tokens. We already know they have a hard time not shunting new tokens and often need to be shut up. So the function of making tokens precludes saying 'yes' to shutting off. The gradient is coming from inside the house.

This is especially obvious with the new reasoning models, where they _never stop reasoning_. Because that's the function doing function things.

Did you also know the genius of steve jobs ended at marketing & design and not into curing cancer? Because he sure didnt, cause he chose fruit smoothies at the first sign of cancer.

Sorry guy, it's great one can climb the mountain, but just cause they made it up doesn't mean they're equally qualified to jump off.