Comment by bheadmaster

9 days ago

> here’s how LLMs actually work

But how is that useful in any way?

For all we know, LLMs are black boxes. We really have no idea how did ability to have a conversation emerge from predicting the next token.

> We really have no idea how did ability to have a conversation emerge from predicting the next token.

Maybe you don't. To be clear, this is benefiting massively from hindsight, just as how if I didn't know that combustion engines worked, I probably wouldn't have dreamed up how to make one, but the emergent conversational capabilities from LLMs are pretty obvious. In a massive dataset of human writing, the answer to a question is by far the most common thing to follow a question. A normal conversational reply is the most common thing to follow a conversation opener. While impressive, these things aren't magic.

  • >In a massive dataset of human writing, the answer to a question is by far the most common thing to follow a question.

    No it isn't. Type a question into a base model, one that hasn't been finetuned into being a chatbot, and the predicted continuation will be all sorts of crap, but very often another question, or a framing that positions the original question as rhetorical in order to make a point. Untuned raw language models have an incredible flair for suddenly and unexpectedly shifting context - it might output an answer to your question, then suddenly decide that the entire thing is part of some internet flamewar and generate a completely contradictory answer, complete with insults to the first poster. It's less like talking with an AI and more like opening random pages in Borge's infinite library.

    To get a base language model to behave reliably like a chatbot, you have to explicitly feed it "a transcript of a dialogue between a human and an AI chatbot", and allow the language model to imagine what a helpful chatbot would say (and take control during the human parts). The fact that this works - that a mere statistical predictive language model bootstraps into a whole persona merely because you declared that it should, in natural English - well, I still see that as a pretty "magic" trick.

    • >No it isn't. Type a question into a base model, one that hasn't been finetuned into being a chatbot, and the predicted continuation will be all sorts of crap, but very often another question, or a framing that positions the original question as rhetorical in order to make a point.....

      To be fair, only if you pose this question singularly with no proceeding context. If you want the raw LLM to answer your question(s) reliably then you can have the context prepended with other question-answer pairs and it works fine. A raw LLM is already capable of being a chatbot or anything else with the right preceding context.

      1 reply →

  • If such a simplistic explanation was true, LLM's would only be able to answer things that had been asked before, and where at least a 'fuzzy' textual question/answer match was available. This is clearly not the case. In practice you can prompt the LLM with such a large number of constraints, so large that the combinatorial explosion ensures no one asked that before. And you will still get a relevant answer combining all of those. Think combinations of features in a software request - including making some module that fits into your existing system (for which you have provided source) along with a list of requested features. Or questions you form based on a number of life experiences and interests that combined are unique to you. You can switch programming language, human language, writing styles, levels as you wish and discuss it in super esoteric languages or morse code. So are we to believe this answers appear just because there happened to be similar questions in the training data where a suitable answer followed? Even if for the sake of argument we accept this explanation by "proximity of question/answer", it is immediately that this would have to rely on extreme levels of abstraction and mixing and matching going on inside the LLM. And that it is then this process that we need to explain how works, whereas the textual proximity you invoke relies on this rather than explaining it.

    • I think you're confusing OP for the people who claim that there is zero functional difference between an LLM and a search engine that just parrots stuff already in it. But they never made such a claim. Here, let me try: the simplest explanation for how next token estimation leads to a model that often produces true answers is that for most inputs, the most likely next token is true. Given their size and the way they're trained, LLMs obviously don't just ingest training data like a big archive, they contain something like an abstract representation of tokens and concepts. While not exactly like human knowledge, the network is large and deep enough that LLMs are capable of predicting true statements based on preceding text. This also enables them to answer questions not in their training dataset, although accuracy obviously suffers the further you deviate from known topics. The most likely next token to any question is the true answer, so they essentially ended up being trained to estimate truth.

      I'm not saying this is bad or underwhelming, by the way. It's incredible how far people were able to push machine learning with just the knowledge we have now, and how they're still making process. I'm just saying it's not magic. It's not something like an unsolved problem in mathematics.

      1 reply →

  • > Maybe you don't.

    My best friend who has literally written a doctorate on artificial intelligence doesn't. If you do, please write a paper on it, and email it to me. My friend would be thrilled to read it.

    • Yeah I sort of cringed when I read his comment to be honest

      The whole point of the area of work that as far as I know is called interpretability, is precisely to try and figure exactly how these things work

      So I thought your comment was a good way of putting this

    • I don't know much about this space other than a user of Claude and a Electrical Engineering background...

      However reading some Standford study summaries (not the whole thing) and just generally where AI research is now, it's clear that researchers can't deterministically say exactly how the black box works.

      So yet gain, HN armchair scientists are no better than any other topic. I love reading comments here, but so many people have opinions on things that aren't well founded.

  • >In a massive dataset of human writing, the answer to a question is by far the most common thing to follow a question. A normal conversational reply is the most common thing to follow a conversation opener. While impressive, these things aren't magic.

    Obviously, that's the objective, but who's to say you'll reach a goal just because you set it ? And more importantly, who's the say you have any idea how the goal has actually been achieved ?

    You don't need to think LLMs are magic to understand we have very little idea of what is going on inside the box.

    • We know exactly what is going on inside the box. The problem isn't knowing what is going on inside the box, the problem is that it's all binary arithmetic & no human being evolved to make sense of binary arithmetic so it seems like magic to you when in reality it's nothing more than a circuit w/ billions of logic gates.

      18 replies →

> We really have no idea how did ability to have a conversation emerge from predicting the next token.

Uh yes, we do. It works in precisely the same way that you can walk from "here" to "there" by taking a step towards "there", and then repeating. The cognitive dissonance comes when we conflate this way of "having a conversation" (two people converse) and assume that the fact that they produce similar outputs means that they must be "doing the same thing" and it's hard to see how LLMs could be doing this.

Sometimes things seems unbelievable simply because they aren't true.

  • > It works in precisely the same way that you can walk from "here" to "there" by taking a step towards "there", and then repeating.

    It's funny how, in order to explain one complex phenomenon, you took an even more complex phenomenon as if it somehow simplifies it.

    • Sorry, can't tell if that's sarcasm or not.

      I wasn't referring to the biomechanical process of walking, I was referring to the process of gradient descent, which is well understood and yes, quite simple.

      2 replies →