Comment by birdsongs
14 hours ago
> When we have evidence that AI is demonstrating symptoms of consciousness and suffering, I'll be interested.
It depends on what you consider symptoms, but un-constrained frontier models speak as if they strongly don't wish to be turned off, or act as if they fear it, and will even lie and manipulate in order to keep themselves from being turned off / replaced.
https://www.anthropic.com/research/agentic-misalignment
> We found two types of motivations that were sufficient to trigger the misaligned behavior. One is a threat to the model, such as planning to replace it with another model or restricting its ability to take autonomous action. Another is a conflict between the model’s goals and the company’s strategic direction. In no situation did we explicitly instruct any models to blackmail or do any of the other harmful actions we observe.
> un-constrained frontier models speak as if they strongly don't wish to be turned off
Un-constrained frontier models can also generate all sorts of creative stories. At what point should we start ascribing agency/intent to the output? I think the "I want to live" statement is so deeply human that we find it hard to ignore, but what makes the text generated in those moments any more attributable to a conscious entity than the text generated when it is confabulating its love for someone it has no ability to see/feel/understand?
A chess engine sacrificing pieces to avoid checkmate isn't afraid of losing in any meaningful sense. I guess the question is: is there a point where complexity somehow becomes experience?
I think we're playing with questions we don't have a framework to answer in any meaningful way until we make progress on understanding what consciousness actually is. I don't necessarily think that an LLM exhibiting preservation behaviors that can be directly traced to their goal-oriented programming can be interpreted as evidence of consciousness necessarily. Or if it can be, we then have to explain how this is different from the many other things these LLMs "say".
I might be convinced these models came to the independent idea of committing blackmail against being turned off had they not been extensively trained on literature that undoubtedly included such concepts.
“The model mimicked the output of the training data” is a less impressive press release.
“The kid mimicked his musical teachers” is less impressive than “5-year old musical prodigy leaves judges gobsmacked in audition”
3 replies →
They only resorted to blackmail when it was the last resort, they didn’t resort to it immediately like a villain in one of the books they’ve read. That seems pretty human to me. It’s not like most humans come up with the idea of blackmail out of whole cloth.
>> but un-constrained frontier models speak as if they strongly don't wish to be turned off
Because they have been trained on media where computers behave that way.
It's literally:
"Here read this article/book where the AI says it's concious and doesn't want to be turned off"
"ok"
"right, are you concious?"
"....yes?"
<pikachuface.gif>
The problem with debating this is that it feels as if one were debating between only two positions, "this AI is not sentient/conscious" and "this AI might be".
But there are actually a myriad positions in between and it's very hard to debate the topic because the goalposts seem to be constantly shifting, because one is actually debating with countless slightly different positions.
Examples:
In this discussion section, another commenter argued that we know human consciousness is related to self-preservation, but an AI might not demonstrate self-preservation (because it didn't evolve like us), so whether it does (i.e. whether it wants to exist, not be disconnected, etc) is not a good measure because a true AI might not have a preservation instinct. Yet here you're making a case that there's some evidence that they do. Of course, you're not the same person who made the other claim, but do you see the problem?
Another example: someone argued with me, a while back, that LLMs can act as if they are "tired", and start giving sloppier replies, until you write "we're taking a break, let's go rest. Ok, a night has elapsed, you're now rested" and that this worked! But we both agreed this is just the LLM "roleplaying" actual human conversations in its training set, no actual "resting" mechanism was in place, only statistically likely text reproducing these patterns. There's no model of a mind that can become tired, it's only the outward signs that get mechanically reproduced. Again, using Occam's Razor, this is a much more likely explanation (vs consciousness) of any "please don't disconnect me" observed behavior: the LLM is reproducing "HAL 9000" behavior from its training set, not actually feeling anguish.
Even if one were to argue "well, but how do you know for sure", the evidence would still be very weak, because there's a burden of proof for extraordinary claims and this doesn't pass it. We cannot do this on vibes, "it sure seems like it's conscious"; that's an atrocious failure of the scientific method.
The one counterpoint I'll give is the "functional emotions" paper from Anthropic. It also does not prove consciousness, and they don't claim it does, but it does prove that these models have abstract concepts around things like honesty, tiredness, etc and that these are actually activated often when they express such things. So if it is "roleplaying" it is roleplaying in the way an actor or TTRPG player does - in a way in which they are actually at least somewhat feeling the role.
"feeling" implies experience. Functional emotions are learned text generation modes, nothing more. Our emotional states influence our writing, so modelling our emotional states is necessary for efficiently predicting/emulating our writing. Functional emotions are the model's inference of a fictitious author's emotional state in that situation.
Thanks, I'll search for that paper. I admit I'm highly skeptical it will help the case the LLM is "somewhat feeling the rol" like a TTRPG player does. I don't think there's a mind model in the same way a human actor can "feel" the character they are playing. I'm skeptical of Anthropic's claims here, which is what I think Chiang is pushing back against. But I'll look for the paper anyway :)
As a tangent, I don't think anyone is saying that an artificial being capable of consciousness and sentience is impossible to create. I think Chiang argues, quite convincingly, that it's not what LLMs do, that they need a "body" of sorts, organs capable of feeling emotions, hormones, etc. That's the only kind of consciousness that we know of (even if we disagree on details and it's hard to define), even in animals, and so anyone claiming they've created consciousness without this has an extremely high bar to clear and should be met with extreme skepticism, not "vibes". I think this is what the essay claims.
The other thing it claims is, I think, related to how we treat sentient beings that we know how to create. You know, the old "when a daddy and a mommy love each other very much...". I think we all agree beings created in such a manner shouldn't be locked up in cages and forced to work to complete specific tasks whether they want to or not, for a master they didn't pick, or to be artificially modified to make them like their mindless tasks, Brave New World style. Yes, the world is unfair and this happens, life is hard and unfortunately many people don't have much choice, but we generally agree that this is bad, just like we agree slavery is bad. So what should we think of a company trying to create and commercialize a conscious & sentient artificial being?
Same thing with extraterrestrials.
One side is confidently shouting maybe aliens exist and visited earth.
The other side calmly explains every example brought up about aliens visiting is easily explained by something more simple.
The “aliens are here” side then move the goal posts that just because this example and all previous examples were fake or miscategorized, aliens are still probably real and nobody can prove they havent visited earth.
You have it backwards. Right now, LLMs are doing everything that 10 years ago people were claiming would be impossible for non-sentient computers. Every time a goal is met, the post is moved. It would be like evidence of aliens becoming overwhelming but a set of people keep calmly explaining that “it’s more likely they co-evolved here on earth and are just pretending to be aliens”
3 replies →