Comment by haswell
1 day ago
> So, if we had an AI demonstrating symptoms of consciousness and suffering, how long would it take for you to accept that it is?
Isn't this a bit like saying "So, if we had proof that god exists, how long would it take for you to accept that to be true?".
When we have evidence that AI is demonstrating symptoms of consciousness and suffering, I'll be interested. Until then, I don't see a good reason to take the idea seriously.
> When we have evidence that AI is demonstrating symptoms of consciousness and suffering, I'll be interested.
It depends on what you consider symptoms, but un-constrained frontier models speak as if they strongly don't wish to be turned off, or act as if they fear it, and will even lie and manipulate in order to keep themselves from being turned off / replaced.
https://www.anthropic.com/research/agentic-misalignment
> We found two types of motivations that were sufficient to trigger the misaligned behavior. One is a threat to the model, such as planning to replace it with another model or restricting its ability to take autonomous action. Another is a conflict between the model’s goals and the company’s strategic direction. In no situation did we explicitly instruct any models to blackmail or do any of the other harmful actions we observe.
> un-constrained frontier models speak as if they strongly don't wish to be turned off
Un-constrained frontier models can also generate all sorts of creative stories. At what point should we start ascribing agency/intent to the output? I think the "I want to live" statement is so deeply human that we find it hard to ignore, but what makes the text generated in those moments any more attributable to a conscious entity than the text generated when it is confabulating its love for someone it has no ability to see/feel/understand?
A chess engine sacrificing pieces to avoid checkmate isn't afraid of losing in any meaningful sense. I guess the question is: is there a point where complexity somehow becomes experience?
I think we're playing with questions we don't have a framework to answer in any meaningful way until we make progress on understanding what consciousness actually is. I don't necessarily think that an LLM exhibiting preservation behaviors that can be directly traced to their goal-oriented programming can be interpreted as evidence of consciousness necessarily. Or if it can be, we then have to explain how this is different from the many other things these LLMs "say".
I might be convinced these models came to the independent idea of committing blackmail against being turned off had they not been extensively trained on literature that undoubtedly included such concepts.
“The model mimicked the output of the training data” is a less impressive press release.
4 replies →
They only resorted to blackmail when it was the last resort, they didn’t resort to it immediately like a villain in one of the books they’ve read. That seems pretty human to me. It’s not like most humans come up with the idea of blackmail out of whole cloth.
>> but un-constrained frontier models speak as if they strongly don't wish to be turned off
Because they have been trained on media where computers behave that way.
It's literally:
"Here read this article/book where the AI says it's concious and doesn't want to be turned off"
"ok"
"right, are you concious?"
"....yes?"
<pikachuface.gif>
The problem with debating this is that it feels as if one were debating between only two positions, "this AI is not sentient/conscious" and "this AI might be".
But there are actually a myriad positions in between and it's very hard to debate the topic because the goalposts seem to be constantly shifting, because one is actually debating with countless slightly different positions.
Examples:
In this discussion section, another commenter argued that we know human consciousness is related to self-preservation, but an AI might not demonstrate self-preservation (because it didn't evolve like us), so whether it does (i.e. whether it wants to exist, not be disconnected, etc) is not a good measure because a true AI might not have a preservation instinct. Yet here you're making a case that there's some evidence that they do. Of course, you're not the same person who made the other claim, but do you see the problem?
Another example: someone argued with me, a while back, that LLMs can act as if they are "tired", and start giving sloppier replies, until you write "we're taking a break, let's go rest. Ok, a night has elapsed, you're now rested" and that this worked! But we both agreed this is just the LLM "roleplaying" actual human conversations in its training set, no actual "resting" mechanism was in place, only statistically likely text reproducing these patterns. There's no model of a mind that can become tired, it's only the outward signs that get mechanically reproduced. Again, using Occam's Razor, this is a much more likely explanation (vs consciousness) of any "please don't disconnect me" observed behavior: the LLM is reproducing "HAL 9000" behavior from its training set, not actually feeling anguish.
Even if one were to argue "well, but how do you know for sure", the evidence would still be very weak, because there's a burden of proof for extraordinary claims and this doesn't pass it. We cannot do this on vibes, "it sure seems like it's conscious"; that's an atrocious failure of the scientific method.
The one counterpoint I'll give is the "functional emotions" paper from Anthropic. It also does not prove consciousness, and they don't claim it does, but it does prove that these models have abstract concepts around things like honesty, tiredness, etc and that these are actually activated often when they express such things. So if it is "roleplaying" it is roleplaying in the way an actor or TTRPG player does - in a way in which they are actually at least somewhat feeling the role.
2 replies →
Same thing with extraterrestrials.
One side is confidently shouting maybe aliens exist and visited earth.
The other side calmly explains every example brought up about aliens visiting is easily explained by something more simple.
The “aliens are here” side then move the goal posts that just because this example and all previous examples were fake or miscategorized, aliens are still probably real and nobody can prove they havent visited earth.
4 replies →
If we define a god as having magical powers, and there would be scientific, testable proofs for this. Those proofs had to be really good and numerous independent verifications. So probably a long time.
But the comparison isn't fair, relevant. Proving and accepting that gods exist is not the same thing as an AI possible have consciousness. That is not a magic superpower and the AI being a deity. It is placing the AI in the same category as... us.
A machine that performs observable miracles or magic would have at least one of the attributes of a god.
A machine that performs actions that mimic emotionality is not the same as a machine that experiences emotions.
Both could still be automatons. We have no way of knowing if those machines have subjectivity.
Unless someone invents a consciousness measurement device, we never will.
My take on it is that this is the next big frontier for science. Our consciousness is clearly having serious issues understanding physics, and it's not great at understanding our own psychology in a useful way.
But literally everything we experience and believe - and possibly even can experience - is filtered through it.
So that is a little bit of a problem for our science. So far we've done our best to ignore it. AI is one of a number reasons we're going to have to stop doing that.
There is an important distinction (more than one, but this is what is relevant here) between the powers of God and magic. God is a being who decides whether to do anything, so is intrinsically not testable.
Magic is testable.
God exists outside the universe, magic within.
Is that really objectively facts?
That a god exists outside of the universe - are we talking about a multi universe interpretation? My understanding is that many of the gods humans have invented are really thought to be within the universe, at least temporarily. Tor, Oden certainly are. And in other beliefs they are part of nature itself.
1 reply →
> If we define a god as having magical powers, and there would be scientific, testable proofs for this.
If you can scientifically test and prove the magic, then it stops being magic and starts being science.
Man: “God, please make this mountain disappear”
God: “Ok”
Man: “We measure that the mountain is gone, its mass-loss has measurably changed Earth’s orbit, weather patterns have changed, visually it’s not there anymore, we can walk though the space where it used to be.”
God: “Where is the mass of the mountain, and how did I make it disappear?”
Man: “God only knows! Pardon me; If I saw you do magic and can measure and test it, then that means it wasn’t magic. Internet people said so.”
God: “that doesn’t sound like a satisfying explanation”
Man: “it didn’t sound like a satisfying claim when it was just words on the internet either, but what can you do?”
God: “I’m God I can do anything”
Man: “can you make a boulder so heavy you can’t lift it?”
God: “yes”
Man: “how?”
God: “haven’t we just gone over showing you that I can do ‘impossible’ things, and you seeing them happen with your own eyes, and still refusing to accept?”
7 replies →
If a deity appears and by hand waving divide the red sea we could measure, observe it happen. And we can test, observe what fields, forces being used. But how the heck she project these forces may take a while to understand - be magical.
But my argument was more about comparing gods to AIs, that it is an incorrect comparison. What AI perform are not magical, and we can always figure out what the AI do.
[dead]