Comment by ramraj07

20 days ago

The fundamental idea that modern LLMs can only ever remix, even if its technically true (doubt), in my opinion only says to me that all knowledge is only ever a remix, perhaps even mathematically so. Anyone who still keeps implying these are statistical parrots or whatever is just going to regret these decisions in the future.

Why doubt? Transformers are a form of kernel smoothing [1]. It's literally interpolation [2]. That doesn't mean it can only echo the exact items in its training data - generating new data items is the entire point of interpolation - but it does mean it's "remixing" (literally forming a weighted sum of) those items and we would expect it to lose fidelity when moving outside the area covered by those points - i.e. where it attempts to extrapolate. And indeed we do see that, and for some reason we call it "hallucinating".

The subsequent argument that "LLMs only remix" => "all knowledge is a remix" seems absurd, and I'm surprised to have seen it now more than once here. Humanity didn't get from discovering fire to launching the JWST solely by remixing existing knowledge.

[1] http://bactra.org/notebooks/nn-attention-and-transformers.ht...

[2] Well, smoothing/estimation but the difference doesn't matter for my point.

  • Its not clear to me that LLMs hallucinating is because of they are extrapolating beyond their training data. Is that proven? Or are you extrapolating?

    Even acknowledging it is interpolation, models can extrapolate slightly without making things up, within the range where the model still applies. Whos to say what this range is for an LLM operating in thousand dimensional space? As far as I can tell the main limiters to LLM creativity are guardrails we put in place for safety and usefulness.

    And what exactly is your proof that human ingenuity is not just pattern matching. Im sure a hypothesis can be put that fire was discovered by just adding up all known facts the people of those times knew and stumbling on something that put it all together. Sounds like knowledge remix + slight extrapolating to me.

    • > Its not clear to me that LLMs hallucinating is because of they are extrapolating beyond their training data. Is that proven? Or are you extrapolating?

      It's a hypothesis at this stage, but I'm going to have a go at making it more quantitative. It seems the obvious explanation for "hallucinations", and it seems like it should also be rather straightforward to attribute particular inference results to the training data that influenced them. I'm expecting to encounter difficulties, though, since the idea seems so obvious it's vanishingly unlikely it hasn't been tried.

      > And what exactly is your proof that human ingenuity is not just pattern matching.

      Firstly, I'm not the one making a strong claim that needs to "proved". Secondly, "pattern matching" is ill-defined and not what I'm saying human intelligence isn't. I'm saying human intelligence isn't a kernel smoothing algorithm run over a corpus of text. This seems rather obvious. What's your proof that it is that?

> Anyone who still keeps implying these are statistical parrots or whatever is just going to regret these decisions in the future.

You know this is a false dichotomy right? You can treat and consider LLMs statistical parrots and at the same time take advantage of them.

  • Yes, but the immediate equivalent scenario to me is how people treated other people as slaves merely using them like machines. Sure you got use out of them, but was that the best use?

There are musicians who "remix" (sample) other artists music and make massive hits themselves.

Not every solution needs to be unique, in many cases "remixing" existing solutions in an unique way is better and faster.

But all of my great ideas are purely from my own original inspiration, and not learning or pattern matching. Nothing derivative or remixed. /sarcasm

Yeah, Yann LeCun is just some luddite lol

  • I don't think he's a luddite at all. He's brilliant in what he does, but he can also be wrong in his predictions (as are all humans from time to time). He did have 3 main predictions in ~23-24 that turned out to be wrong in hindsight. Debatable why they were wrong, but yeah.

    In a stage interview (a bit after the "sparks of agi in gpt4" paper came out) he made 3 statemets:

    a) llms can't do math. They can trick us with poems and subjective prose, but at objective math they fail.

    b) they can't plan

    c) by the nature of their autoregressive architecture, errors compound. so a wrong token will make their output irreversibly wrong, and spiral out of control.

    I think we can safely say that all of these turned out to be wrong. It's very possible that he meant something more abstract, and technical at its core, but in the real life all of these things were overcome. So, not a luddite, but also not a seer.

    • Have this shortcomings of llms been addressed by better models or by better integration with other tools? Like, are they better at coding because the models are truly better or because the agentic loops are better designed?

      2 replies →

  • You don't understand Yann's argument. It's similar to Richard Sutton's, in that these things aren't thinking, they're emulating thinking, and the weak implicit world models that get built in the weights are insufficient for true "AGI."

    This is orthogonal to the issue of whether all ideas are essentially "remixes." For the record I agree that they are.