← Back to context

Comment by haswell

5 hours ago

> un-constrained frontier models speak as if they strongly don't wish to be turned off

Un-constrained frontier models can also generate all sorts of creative stories. At what point should we start ascribing agency/intent to the output? I think the "I want to live" statement is so deeply human that we find it hard to ignore, but what makes the text generated in those moments any more attributable to a conscious entity than the text generated when it is confabulating its love for someone it has no ability to see/feel/understand?

A chess engine sacrificing pieces to avoid checkmate isn't afraid of losing in any meaningful sense. I guess the question is: is there a point where complexity somehow becomes experience?

I think we're playing with questions we don't have a framework to answer in any meaningful way until we make progress on understanding what consciousness actually is. I don't necessarily think that an LLM exhibiting preservation behaviors that can be directly traced to their goal-oriented programming can be interpreted as evidence of consciousness necessarily. Or if it can be, we then have to explain how this is different from the many other things these LLMs "say".