Comment by tkgally
3 days ago
I am unsure myself whether we should regard LLMs as mere token-predicting automatons or as some new kind of incipient intelligence. Despite their origins as statistical parrots, the interpretability research from Anthropic [1] suggests that structures corresponding to meaning do exist inside those bundles of numbers and that there are signs of activity within those bundles of numbers that seem analogous to thought.
That said, I was struck by a recent interview with Anthropic’s Amanda Askell [2]. When she talks, she anthropomorphizes LLMs constantly. A few examples:
“I don't have all the answers of how should models feel about past model deprecation, about their own identity, but I do want to try and help models figure that out and then to at least know that we care about it and are thinking about it.”
“If you go into the depths of the model and you find some deep-seated insecurity, then that's really valuable.”
“... that could lead to models almost feeling afraid that they're gonna do the wrong thing or are very self-critical or feeling like humans are going to behave negatively towards them.”
[1] https://www.anthropic.com/research/team/interpretability
Amanda Askell studied under David Chalmers at NYU: the philosopher who coined "the hard problem of consciousness" and is famous for taking phenomenal experience seriously rather than explaining it away. That context makes her choice to speak this way more striking: this isn't naive anthropomorphizing from someone unfamiliar with the debates. It's someone trained by one of the most rigorous philosophers of consciousness, who knows all the arguments for dismissing mental states in non-biological systems, and is still choosing to speak carefully about models potentially having something like feelings or insecurities.
A person can study fashion extensively, under the best designers, they can understand tailoring and fit and have a phenomenal eye for color and texture.
Their vivid descriptions of what the Emperor could be wearing doesn't make said emperor any less nakey.
>research from Anthropic [1] suggests that structures corresponding to meaning exist inside those bundles of numbers and that there are signs of activity within those bundles of numbers that seem analogous to thought.
Can you give some concrete examples? The link you provided is kind of opaque
>Amanda Askell [2]. When she talks, she anthropomorphizes LLMs constantly.
She is a philosopher by trade and she describes her job (model alignment) as literally to ensure models "have good character traits." I imagine that explains a lot
Here are three of the Anthropic research reports I had in mind:
https://www.anthropic.com/news/golden-gate-claude
Excerpt: “We found that there’s a specific combination of neurons in Claude’s neural network that activates when it encounters a mention (or a picture) of this most famous San Francisco landmark.”
https://www.anthropic.com/research/tracing-thoughts-language...
Excerpt: “Recent research on smaller models has shown hints of shared grammatical mechanisms across languages. We investigate this by asking Claude for the ‘opposite of small’ across different languages, and find that the same core features for the concepts of smallness and oppositeness activate, and trigger a concept of largeness, which gets translated out into the language of the question.”
https://www.anthropic.com/research/introspection
Excerpt: “Our new research provides evidence for some degree of introspective awareness in our current Claude models, as well as a degree of control over their own internal states.”
It’s important to note that these “research papers” that Anthropic releases are not properly peer-reviewed and not accepted by any scientific journal or institution. Anthropic has a history of over-exaggerating research, and have an obvious monetary incentive to continue to do so.
Well, she's describing the system's behavior.
My fridge happily reads inputs without consciousness, has goals and takes decisions without "thinking", and consistently takes action to achieve those goals. (And it's not even a smart fridge! It's the one with a copper coil or whatever.)
I guess the cybernetic language might be less triggering here (talking about systems and measurements and control) but it's basically the same underlying principles. One is just "human flavored" and I therefore more prone to invite unhelpful lines of thinking?
Except that the "fridge" in this case is specifically and explicitly designed to emulate human behavior so... you would indeed expect to find structures corresponding to the patterns it's been designed to simulate.
Wondering if it's internalized any other human-like tendencies — having been explicitly trained to simulate the mechanisms that produced all human text — doesn't seem too unreasonable to me.
> the interpretability research from Anthropic [1] suggests that structures corresponding to meaning do exist inside those bundles of numbers and that there are signs of activity within those bundles of numbers that seem analogous to thought
I did a simple experiment - took a photo of my kid in the park, showed it to Gemini and asked for a "detailed description". Then I took that description and put it into a generative model (Z-Image-Turbo, a new one). The output image was almost identical.
So one model converted image to text, the other reversed the processs. The photo was completely new, personal, never put online. So it was not in any training set. How did these 2 models do it if not actually using language like a thinking agent?
https://pbs.twimg.com/media/G7gTuf8WkAAGxRr?format=jpg&name=...
> How did these 2 models do it if not actually using language like a thinking agent?
By having a gazillion of other, almost identical pictures of kids in parks in their training data.
Not pictures with this composition, same jacket, etc - yes, there are images but they are different, this fits like a key in the lock to the original
1 reply →
I use LLMs heavily for work, I have done so for about 6 months. I see almost zero "thought" going on and a LOT of pattern matching. You can use this knowledge to your advantage if you understand this. If you're relying on it to "think", disaster will ensue. At least that's been my experience.
I've completely given up on using LLMs for anything more than a typing assistant / translator and maybe an encyclopedia when I don't care about correctness.
Wow those quotes are extremely disturbing.
the anthropomorphization (say that 3 times quickly) is kinda weird, but also makes for a much more pleasant conversation imo. it's kinda tedious being pedantic all the time.
It also leads to fundamentally wrong conclusions: a related issue I have with this is the use of anthropomorphic shorthand when discussing international politics. You've heard a phrase like "the US thinks...", "China wants...", "Europe believes..." so much you don't even notice it.
All useful shorthands, all which lead to people displaying fundamental misunderstandings of what they're talking about - i.e. expressing surprise that a nation of millions doesn't display consistency of behavior of human lifetime scales, even though fairly obviously the mechanisms of government are churning their make up constantly, and depending on context maybe entirely different people.
It seems obvious to me that entities have emergent needs and plans and so on, independent of any of the humans inside.
For example, if you've worked at a large company, one of the little tragedies is when someone everyone likes gets laid off. There were probably no people who actively wanted Bob to lose his job. Even the CEO/Board who pulled the trigger probably had nothing against Bob. Heck, they might be the next ones out the door. The company is faceless, yet it wanted Bob to go, because that apparently contributed to the company's objective function. Had the company consisted entirely of different people, plus Bob, Bob might have been laid off anyway.
There is a strong will to do ... things the emerges from large structures of people and technology. It's funny like that.
A country. A collective of people with a dedicated structure to represent interests and enforce strategies of the said collective as a whole.
It obfuscates far more than it clarifies.
This argument would have a lot more weight if it was published in a peer reviewed journal by a party that does not have a stake in the AI market.