← Back to context

Comment by libraryofbabel

6 days ago

This is the 2023 take on LLMs. It still gets repeated a lot. But it doesn’t really hold up anymore - it’s more complicated than that. Don’t let some factoid about how they are pretrained on autocomplete-like next token prediction fool you into thinking you understand what is going on in that trillion parameter neural network.

Sure, LLMs do not think like humans and they may not have human-level creativity. Sometimes they hallucinate. But they can absolutely solve new problems that aren’t in their training set, e.g. some rather difficult problems on the last Mathematical Olympiad. They don’t just regurgitate remixes of their training data. If you don’t believe this, you really need to spend more time with the latest SotA models like Opus 4.5 or Gemini 3.

Nontrivial emergent behavior is a thing. It will only get more impressive. That doesn’t make LLMs like humans (and we shouldn’t anthropomorphize them) but they are not “autocomplete on steroids” anymore either.

> Don’t let some factoid about how they are pretrained on autocomplete-like next token prediction fool you into thinking you understand what is going on in that trillion parameter neural network.

This is just an appeal to complexity, not a rebuttal to the critique of likening an LLM to a human brain.

> they are not “autocomplete on steroids” anymore either.

Yes, they are. The steroids are just even more powerful. By refining training data quality, increasing parameter size, and increasing context length we can squeeze more utility out of LLMs than ever before, but ultimately, Opus 4.5 is the same thing as GPT2, it's only that coherence lasts a few pages rather than a few sentences.

  • > ultimately, Opus 4.5 is the same thing as GPT2, it's only that coherence lasts a few pages rather than a few sentences.

    This tells me that you haven't really used Opus 4.5 at all.

  • First, this is completely ignoring text diffusion and nano banana.

    Second, to autocomplete the name of the killer in a detective book outside of the training set requires following and at least some understanding of the plot.

  • This would be true if all training were based on sentence completion. But training involving RLHF and RLAIF is increasingly important, isn't it?

    • Reinforcement learning is a technique for adjusting weights, but it does not alter the architecture of the model. No matter how much RL you do, you still retain all the fundamental limitations of next-token prediction (e.g. context exhaustion, hallucinations, prompt injection vulnerability etc)

      1 reply →

  • > This is just an appeal to complexity, not a rebuttal to the critique of likening an LLM to a human brain

    I wasn’t arguing that LLMs are like a human brain. Of course they aren’t. I said twice in my original post that they aren’t like humans. But “like a human brain” and “autocomplete on steroids” aren’t the only two choices here.

    As for appealing to complexity, well, let’s call it more like an appeal to humility in the face of complexity. My basic claim is this:

    1) It is a trap to reason from model architecture alone to make claims about what LLMs can and can’t do.

    2) The specific version of this in GP that I was objecting to was: LLMs are just transformers that do next token prediction, therefore they cannot solve novel problems and just regurgitate their training data. This is provably true or false, if we agree on a reasonable definition of novel problems.

    The reason I believe this is that back in 2023 I (like many of us) used LLM architecture to argue that LLMs had all sorts of limitations around the kind of code they could write, the tasks they could do, the math problems they could solve. At the end of 2025, SotA LLMs have refuted most of these claims by being able to do the tasks I thought they’d never be able to do. That was a big surprise to a lot us in the industry. It still surprises me every day. The facts changed, and I changed my opinion.

    So I would ask you: what kind of task do you think LLMs aren’t capable of doing, reasoning from their architecture?

    I was also going to mention RL, as I think that is the key differentiator that makes the “knowledge” in the SotA LLMs right now qualitatively different from GPT2. But other posters already made that point.

    This topic arouses strong reactions. I already had one poster (since apparently downvoted into oblivion) accuse me of “magical thinking” and “LLM-induced-psychosis”! And I thought I was just making the rather uncontroversial point that things may be more complicated than we all thought in 2023. For what it’s worth, I do believe LLMs probably have limitations (like they’re not going to lead to AGI and are never going to do mathematics like Terence Tao) and I also think we’re in a huge bubble and a lot of people are going to lose their shirts. But I think we all owe it to ourselves to take LLMs seriously as well. Saying “Opus 4.5 is the same thing as GPT2” isn’t really a pathway to do that, it’s just a convenient way to avoid grappling with the hard questions.

  • This ignores that reinforcement learning radically changes the training objective

  • First: a selection mechanism is just a selection mechanism, and it shouldn't confuse the observation of an emergent, tangential capabilities.

    Probably you believe that humans have something called intelligence, but the pressure that produced it - the likelihood of specific genetic material to replicate - it is much more tangential to intelligence than next-token-prediction.

    I doubt many alien civilizations would look at us and say "not intelligent - they're just genetic information replication on steroids".

    Second: modern models also under go a ton of post-training now. RLHF, mechanized fine-tuning on specific use cases, etc etc. It's just not correct that token-prediction loss function is "the whole thing".

    • > First: a selection mechanism is just a selection mechanism, and it shouldn't confuse the observation of an emergent, tangential capabilities.

      Invoking terms like "selection mechanism" is begging the question because it implicitly likens next-token-prediction training to natural selection, but in reality the two are so fundamentally different that the analogy only has metaphorical meaning. Even at a conceptual level, gradient descent gradually honing in on a known target is comically trivial compared to the blind filter of natural selection sorting out the chaos of chemical biology. It's like comparing legos to DNA.

      > Second: modern models also under go a ton of post-training now. RLHF, mechanized fine-tuning on specific use cases, etc etc. It's just not correct that token-prediction loss function is "the whole thing".

      RL is still token prediction, it's just a technique for adjusting the weights to align with predictions that you can't model a loss function for in per-training. When RL rewards good output, it's increasing the statistical strength of the model for an arbitrary purpose, but ultimately what is achieved is still a brute force quadratic lookup for every token in the context.

I use enterprise LLM provided by work, working on very proprietary codebase on a semi esoteric language. My impression is it is still a very big autocompletion machine.

You still need to hand hold it all the way as it is only capable of regurgitating the tiny amount of code patterns it saw in the public. As opposed to say a Python project.

  • What model is your “enterprise LLM”?

    But regardless, I don’t think anyone is claiming that LLMs can magically do things that aren’t in their training data or context window. Obviously not: they can’t learn on the job and the permanent knowledge they have is frozen in during training.

As someone who still might have a '2023 take on LLMs', even though I use them often at work, where would you recommend I look to learn more about what a '2025 LLM' is, and how they operate differently?

  • Papers on mechanistic interpratability and representation engineering, e.g. from Anthropic would be a good start.

  • Don't bother. This bubble will pop in two years, you don't want to look back on your old comments in shame in three.

> it’s more complicated than that.

No it isn't.

> ...fool you into thinking you understand what is going on in that trillion parameter neural network.

It's just matrix multiplication and logistic regression, nothing more.

  • LLMs are a general purpose computing paradigm. LLMs are circuit builders, the converged parameters define pathways through the architecture that pick out specific programs. Or as Karpathy puts it, LLMs are a differentiable computer[1]. Training LLMs discovers programs that well reproduce the input sequence. Roughly the same architecture can generate passable images, music, or even video.

    The sequence of matrix multiplications are the high level constraint on the space of programs discoverable. But the specific parameters discovered are what determines the specifics of information flow through the network and hence what program is defined. The complexity of the trained network is emergent, meaning the internal complexity far surpasses that of the course-grained description of the high level matmul sequences. LLMs are not just matmuls and logits.

    [1] https://x.com/karpathy/status/1582807367988654081

>> Sometimes they hallucinate.

For someone speaking as you knew everything, you appear to know very little. Every LLM completion is a "hallucination", some of them just happen to be factually correct.