← Back to context

Comment by barrkel

6 months ago

The problem with viewing LLMs as just sequence generators, and malbehaviour as bad sequences, is that it simplifies too much. LLMs have hidden state not necessarily directly reflected in the tokens being produced and it is possible for LLMs to output tokens in opposition to this hidden state to achieve longer term outcomes (or predictions, if you prefer).

Is it too anthropomorphic to say that this is a lie? To say that the hidden state and its long term predictions amount to a kind of goal? Maybe it is. But we then need a bunch of new words which have almost 1:1 correspondence to concepts from human agency and behavior to describe the processes that LLMs simulate to minimize prediction loss.

Reasoning by analogy is always shaky. It probably wouldn't be so bad to do so. But it would also amount to impenetrable jargon. It would be an uphill struggle to promulgate.

Instead, we use the anthropomorphic terminology, and then find ways to classify LLM behavior in human concept space. They are very defective humans, so it's still a bit misleading, but at least jargon is reduced.

IMHO, anthrophormization of LLMs is happening because it's perceived as good marketing by big corporate vendors.

People are excited about the technology and it's easy to use the terminology the vendor is using. At that point I think it gets kind of self fulfilling. Kind of like the meme about how to pronounce GIF.

  • I think anthropomorphizing LLMs is useful, not just a marketing tactic. A lot of intuitions about how humans think map pretty well to LLMs, and it is much easier to build intuitions about how LLMs work by building upon our intuitions about how humans think than by trying to build your intuitions from scratch.

    Would this question be clear for a human? If so, it is probably clear for an LLM. Did I provide enough context for a human to diagnose the problem? Then an LLM will probably have a better chance of diagnosing the problem. Would a human find the structure of this document confusing? An LLM would likely perform poorly when reading it as well.

    Re-applying human intuitions to LLMs is a good starting point to gaining intuition about how to work with LLMs. Conversely, understanding sequences of tokens and probability spaces doesn't give you much intuition about how you should phrase questions to get good responses from LLMs. The technical reality doesn't explain the emergent behaviour very well.

    I don't think this is mutually exclusive with what the author is talking about either. There are some ways that people think about LLMs where I think the anthropomorphization really breaks down. I think the author says it nicely:

    > The moment that people ascribe properties such as "consciousness" or "ethics" or "values" or "morals" to these learnt mappings is where I tend to get lost.

    • Take a look at the judge’s ruling in this Anthropic case:

      https://news.ycombinator.com/item?id=44488331

      Here’s a quote from the ruling:

      “First, Authors argue that using works to train Claude’s underlying LLMs was like using works to train any person to read and write, so Authors should be able to exclude Anthropic from this use (Opp. 16). But Authors cannot rightly exclude anyone from using their works for training or learning as such. Everyone reads texts, too, then writes new texts. They may need to pay for getting their hands on a text in the first instance. But to make anyone pay specifically for the use of a book each time they read it, each time they recall it from memory, each time they later draw upon it when writing new things in new ways would be unthinkable. For centuries, we have read and re-read books. We have admired, memorized, and internalized their sweeping themes, their substantive points, and their stylistic solutions to recurring writing problems.”

      They literally compare an LLM learning to a person learning and conflate the two. Anthropic will likely win this case because of this anthropomorphisization.

      1 reply →

  • IMHO it happens for the same reason we see shapes in clouds. The human mind through millions of years has evolved to equate and conflate the ability to generate cogent verbal or written output with intelligence. It's an instinct to equate the two. It's an extraordinarily difficult instinct to break. LLMs are optimised for the one job that will make us confuse them for being intelligent

  • aAnthrophormisation happens because Humans are absolutely terrible at evaluating systems that give converdational text output.

    ELIZA fooled many people into think it was conscious and it wasn't even trying to do that.

  • > because it's perceived as good marketing

    We are making user interfaces. Good user interfaces are intuitive and purport to be things that users are familiar with, such as people. Any alternative explanation of such a versatile interface will be met with blank stares. Users with no technical expertise would come to their own conclusions, helped in no way by telling the user not to treat the chat bot as a chat bot.

  • Nobody cares about what’s perceived as good marketing. People care about what resonates with the target market.

    But yes, anthropomorphising LLMs is inevitable because they feel like an entity. People treat stuffed animals like creatures with feelings and personality; LLMs are far closer than that.

    • the chat interface was a choice, though a natural one. before they'd RLHFed it into chatting and it was just GPT 3 offering completions 1) not very many people used it and 2) it was harder to anthropomorphize

    • > People treat stuffed animals like creatures with feelings and personality; LLMs are far closer than that.

      Children do, some times, but it's a huge sign of immaturity when adults, let alone tech workers, do it.

      I had a professor at University that would yell at us if/when we personified/anthropomorphized the tech, and I have that same urge when people ask me "What does <insert LLM name here> think?".

  • True but also researchers want to believe they are studying intelligence not just some approximation to it.

Do they ? LLM embedd the token sequence N^{L} to R^{LxD}, we have some attention and the output is also R^{LxD}, then we apply a projection to the vocabulary and we get R^{LxV} we get therefore for each token a likelihood over the voc. In the attention, you can have Multi Head attention (or whatever version is fancy: GQA,MLA) and therefore multiple representation, but it is always tied to a token. I would argue that there is no hidden state independant of a token.

Whereas LSTM, or structured state space for example have a state that is updated and not tied to a specific item in the sequence.

I would argue that his text is easily understandable except for the notation of the function, explaining that you can compute a probability based on previous words is understandable by everyone without having to resort to anthropomorphic terminology

  • There is hidden state as plain as day merely in the fact that logits for token prediction exist. The selected token doesn't give you information about how probable other tokens were. That information, that state which is recalculated in autoregression, is hidden. It's not exposed. You can't see it in the text produced by the model.

    There is plenty of state not visible when an LLM starts a sentence that only becomes somewhat visible when it completes the sentence. The LLM has a plan, if you will, for how the sentence might end, and you don't get to see an instance of that plan unless you run autoregression far enough to get those tokens.

    Similarly, it has a plan for paragraphs, for whole responses, for interactive dialogues, plans that include likely responses by the user.

    • The LLM does not "have" a plan.

      Arguably there's reason to believe it comes up with a plan when it is computing token propabilities, but it does not store it between tokens. I.e. it doesn't possess or "have" it. It simply comes up with a plan, emits a token, and entirely throws all its intermediate thoughts (including any plan) to start again from scratch on the next token.

      12 replies →

    • this sounds like a fun research area. do LLMs have plans about future tokens?

      how do we get 100 tokens of completion, and not just one output layer at a time?

      are there papers youve read that you can share that support the hypothesis? vs that the LLM doesnt have ideas about the future tokens when its predicting the next one?

      2 replies →

I think that the hidden state is really just at work improving the model's estimation of the joint probability over tokens. And the assumption here, which failed miserably in the early 20th century in the work of the logical posivitists, is that if you can so expertly estimate that joint probability of language, then you will be able to understand "knowledge." But there's no well grounded reason to believe that and plenty of the reasons (see: the downfall of logical posivitism) to think that language is an imperfect representation of knowledge. In other words, what humans do when we think is more complicated than just learning semiotic patterns and regurgitating them. Philosophical skeptics like Hume thought so, but most epistemology writing after that had better answers for how we know things.

  • There are many theories that are true but not trivially true. That is, they take a statement that seems true and derive from it a very simple model, which is then often disproven. In those cases however, just because the trivial model was disproven doesn't mean the theory was, though it may lose some of its luster by requiring more complexity.

Maybe it's just because so much of my work for so long has focused on models with hidden states but this is a fairly classical feature of some statistical models. One of the widely used LLM textbooks even started with latent variable models; LLMs are just latent variable models just on a totally different scale, both in terms of number of parameters but also model complexity. The scale is apparently important, but seeing them as another type of latent variable model sort of dehumanizes them for me.

Latent variable or hidden state models have their own history of being seen as spooky or mysterious though; in some ways the way LLMs are anthropomorphized is an extension of that.

I guess I don't have a problem with anthropomorphizing LLMs at some level, because some features of them find natural analogies in cognitive science and other areas of psychology, and abstraction is useful or even necessary in communicating and modeling complex systems. However, I do think anthropomorphizing leads to a lot of hype and tends to implicitly shut down thinking of them mechanistically, as a mathematical object that can be probed and characterized — it can lead to a kind of "ghost in the machine" discourse and an exaggeration of their utility, even if it is impressive at times.

I'm not sure what you mean by "hidden state". If you set aside chain of thought, memories, system prompts, etc. and the interfaces that don't show them, there is no hidden state.

These LLMs are almost always, to my knowledge, autoregressive models, not recurrent models (Mamba is a notable exception).

  • If you dont know, that's not necessarily anyone's fault, but why are you dunking into the conversation? The hidden state is a foundational part of a transformers implementation. And because we're not allowed to use metaphors because that is too anthropomorphic, then youre just going to have to go learn the math.

    • The comment you are replying to is not claiming ignorance of how models work. It is saying that the author does know how they work, and they do not contain anything that can properly be described as "hidden state". The claimed confusion is over how the term "hidden state" is being used, on the basis that it is not being used correctly.

    • I don't think your response is very productive, and I find that my understanding of LLMs aligns with the person you're calling out. We could both be wrong, but I'm grateful that someone else spoke saying that it doesn't seem to match their mental model and we would all love to learn a more correct way of thinking about LLMs.

      Telling us to just go and learn the math is a little hurtful and doesn't really get me any closer to learning the math. It gives gatekeeping.

    • Do you appreciate a difference between an autoregressive model and a recurrent model?

      The "transformer" part isn't under question. It's the "hidden state" part.

  • Hidden state in the form of the activation heads, intermediate activations and so on. Logically, in autoregression these are recalculated every time you run the sequence to predict the next token. The point is, the entire NN state isn't output for each token. There is lots of hidden state that goes into selecting that token and the token isn't a full representation of that information.

    • That's not what "state" means, typically. The "state of mind" you're in affects the words you say in response to something.

      Intermediate activations isn't "state". The tokens that have already been generated, along with the fixed weights, is the only data that affects the next tokens.

      8 replies →

  • do LLM models consider future tokens when making next token predictions?

    eg. pick 'the' as the next token because there's a strong probability of 'planet' as the token after?

    is it only past state that influences the choice of 'the'? or that the model is predicting many tokens in advance and only returning the one in the output?

    if it does predict many, id consider that state hidden in the model weights.

Author of the original article here. What hidden state are you referring to? For most LLMs the context is the state, and there is no "hidden" state. Could you explain what you mean? (Apologies if I can't see it directly)

  • Yes, strictly speaking, the model itself is stateless, but there are 600B parameters of state machine for frontier models that define which token to pick next. And that state machine is both incomprehensibly large and also of a similar magnitude in size to a human brain. (Probably, I'll grant it's possible it's smaller, but it's still quite large.)

    I think my issue with the "don't anthropomorphize" is that it's unclear to me that the main difference between a human and an LLM isn't simply the inability for the LLM to rewrite its own model weights on the fly. (And I say "simply" but there's obviously nothing simple about it, and it might be possible already with current hardware, we just don't know how to do it.)

    Even if we decide it is clearly different, this is still an incredibly large and dynamic system. "Stateless" or not, there's an incredible amount of state that is not comprehensible to me.

    • FWIW the number of parameters in a LLM is in the same ballpark as the number of nuerons in a human (roughly 80B) but neurons are not weights, they are kind of a nueral net unto themselves, stateful, adaptive, self modifying, a good variety of neurotransmitters (and their chemical analogs) aside from just voltage.

      It's fun to think about just how fantastic a brain is, and how much wattage and data-center-scale we're throwing around trying to approximate its behavior. Mega-effecient and mega-dense. I'm bearish on AGI simply from an internetworking standpoint, the speed of light is hard to beat and until you can fit 80 billion interconnected cores in half a cubic foot you're just not going to get close to the responsiveness of reacting to the world in real time as biology manages to do. but that's a whole nother matter. I just wanted to pick apart that magnitude of parameters is not an altogether meaningful comparison :)

    • > it's unclear to me that the main difference between a human and an LLM isn't simply the inability for the LLM to rewrite its own model weights on the fly.

      This is "simply" an acknowledgement of extreme ignorance of how human brains work.

    • Fair, there is a lot that is incomprehensible to all of us. I wouldn't call it "state" as it's fixed, but that is a rather subtle point.

      That said, would you anthropomorphize a meteorological simulation just because it contains lots and lots of constants that you don't understand well?

      I'm pretty sure that recurrent dynamical systems pretty quickly become universal computers, but we are treating those that generate human language differently from others, and I don't quite see the difference.

      1 reply →

  • Yes, the context (along with the model weights) is the source data from which the hidden state is calculated , in an analogous way that input and CPU ticks (along with program code) is the way variables in a deterministic program get their value.

    There's loads of state in the LLM that doesn't come out in the tokens it selects. The tokens are just the very top layer, and even then, you get to see just one selection from the possible tokens.

    If you wish to anthropomorphize, that state - the set of activations, all the calculations that add up to the logits that determine the probability of the token to select, the whole lot of it - is what the model is "thinking". But all you get to see is one selected token.

    Then, during autoregression, we run the program again, but one more tick of the CPU clock. Variables get updated a bit more. The chosen token from the previous pass conditions the next token prediction - the hidden state evolves its thinking one more step.

    If you just look at the tokens being selected, you're missing this machinery. And the machinery is there. It's a program being ticked by generating tokens autoregressively. It has state which doesn't directly show up in tokens, it just informs which tokens to select. And the tokens it selects don't necessarily reflect the correspondences with perceived reality that the model is maintaining in that state. That's what I meant by talking about a lie.

    We need a vocabulary to talk about this machinery. The machinery is learned, and it runs programs, effectively, that help the LLM reduce loss when predicting tokens. Since the tokens it's predicting come from human minds, the programs it's running are (broken, lossy, not very good) simulations of processes that seem to run inside human minds.

    The simulations are pretty decent for producing gramatically correct text, for emulating tone and style, and so on. They're okay-ish for representing concepts. They're poor for representing very specific facts. But the overall point is they are simulations, and they have some analogous correspondence with human behavior, such that words we use to describe human behaviour are useful and practical.

    They're not true, I'm not claiming that. But they're useful for talking about these weird defective minds we call LLMs.

> Is it too anthropomorphic to say that this is a lie?

Yes. Current LLMs can only introspect from output tokens. You need hidden reasoning that is within the black box, self-knowing, intent, and motive to lie.

I rather think accusing an LLM of lying is like accusing a mousetrap of being a murderer.

When models have online learning, complex internal states, and reflection, I might consider one to have consciousness and to be capable of lying. It will need to manifest behaviors that can only emerge from the properties I listed.

I've seen similar arguments where people assert that LLMs cannot "grasp" what they are talking about. I strongly suspect a high degree of overlap between those willing to anthropomorphize error bars as lies while declining to award LLMs "grasping". Which is it? It can think or it cannot? (objectively, SoTA models today cannot yet.) The willingness to waffle and pivot around whichever perspective damns the machine completely belies the lack of honesty in such conversations.

  • > Current LLMs can only introspect from output tokens

    The only interpretation of this statement I can come up with is plain wrong. There's no reason LLM shouldn't be able to introspect without any output tokens. As the GP correctly says, most of the processing in LLMs happens over hidden states. Output tokens are just an artefact for our convenience, which also happens to be the way the hidden state processing is trained.

    • > Output tokens are just an artefact for our convenience

      That's nonsense. The hidden layers are specifically constructed to increase the probability that the model picks the right next word. Without the output/token generation stage the hidden layers are meaningless. Just empty noise.

      It is fundamentally an algorithm for generating text. If you take the text away it's just a bunch of fmadds. A mute person can still think, an LLM without output tokens can do nothing.

      1 reply →