Because he has a core belief and based on that core belief he made some statements that turned out to be incorrect. But he kept the core belief and adjusted the statements.
So it's not so much about his incorrect predictions, but that these predictions were based on a core belief. And when the predictions turned out to be false, he didn't adjust his core beliefs, but just his predictions.
So it's natural to ask, if none of the predictions you derived from your core belief come true, maybe your core belief isn't true.
I have not followed all of LeCun's past statements, but -
if the "core belief" is that the LLM architecture cannot be the way to AGI, that is more of an "educated bet", which does not get falsified when LLMs improve but still suggest their initial faults. If seeing that LLMs seem constrained in the "reactive system" as opposed to a sought "deliberative system" (or others would say "intuitive" vs "procedural" etc.) was an implicit part of the original "core belief", then it still stands in spite of other improvements.
If you say LLMs are a dead end, and you give a few examples of things they will never be able to do, and a few months later they do it, and you just respond by stating that sure they can do that but they're still a dead end and won't be able to do this.
Rinse and repeat.
After a while you question whether LLMs are actually a dead end
If you need basically rock solid evidence of X before you stop saying "this thing cannot do X", then you shouldn't be running a forward looking lab. There are only so many directions you can take, only so many resources at your disposal. Your intuition has to be really freakishly good to be running such a lab.
He's done a lot of amazing work, but his stance on LLMs seems continuously off the mark.
I doubt that list is as long as the great minds that glommed onto a new tech that turned out to be a dud, but I could be wrong. It's an interesting question, but each tech needs to be evaluated separately.
I'm going to wear the tinfoil hat: a firm is able to produce a sought-after behavior a few months later and throws people off. Is it more likely that the firm (worth billions at this point) is engineering these solutions into the model, or is it because of emergent neural network architectural magic?
I'm not saying that they are being bad actors, just saying this is more probable in my mind than an LLM breakthrough.
It depends what you mean by "engineering these solutions into the model". Using better data leads to better models given the same architecture and training. Nothing wrong with it, it's hard work, it might be with as specific goal in mind. LLM "breakthroughs" aren't really a thing at this point. It's just one little thing after another.
Because there were plenty of evidences that the statements were either not correct or not based on enough information, at the time they were made. And to be wrong because of personal biases, and then don't clearly state you were wrong when new evidenced appeared, is not a trait of a good scientist. For instance: the strong summarization abilities where already something that, alone, without any further information, were enough to seriously doubt about the stochastic parrot mental model.
I don't see the contradiction between "stochastic parrot" and "strong summarisation abilities".
Where I'm skeptical of LLM skepticism is that people use the term "stochastic parrot" disparagingly, as if they're not impressed. LLMs are stochastic parrots in the sense that they probabilistically guess sequences of things, but isn't it interesting how far that takes you already? I'd never have guessed. Fundamentally I question the intellectual honesty of anyone who pretends they're not surprised by this.
LLMs learn from examples where the logits are not probabilities, but how a given sentence continues (only one token is set to 1). So they don't learn probabilities, they learn how to continue the sentence with a given token. We apply softmax at the logits for mathematical reasons, and it is natural/simpler to think in terms of probabilities, but that's not what happens, nor the neural networks they are composed of is just able to approximate probabilistic functions. This "next token" probability is the source of a lot misunderstanding. It's much better to imagine the logits as "To continue my reply I could say this word, more than the others, or maybe that one, a bit less, ..." and so forth. Now there are evidences, too, that in the activations producing a given token the LLM already has an idea about how most of the sentence is going to continue.
Of course, as they learn, early in the training, the first functions they will model, to lower the error, will start being the probabilities of the next tokens, since this is the simplest function that works for the loss reduction. Then gradients agree in other directions, and the function that the LLM eventually learn is no longer related to probabilities, but to the meaning of the sentence and what it makes sense to say next.
It's not be chance that often the logits have a huge signal in just two or three tokens, even if the sentence, probabilistically speaking, could continue in much more potential ways.
There are some that would describe LLMs as next word predictors, akin to having a bag of magnetic words, where you put your hand in, rummage around, and just pick a next word and put it on the fridge and eventually form sentences. It's "just" predicting the next word, so as an analogy as to how they work, that seems reasonable. The thing is, when that bag consists of a dozen bags-in-bags, like Russian nesting dolls, and the "bag" has a hundred million words in it, the analogy stops being a useful description. It's like describing humans as multicellular organisms. It's an accurate description of what a human is, but somewhere between a simple hydra with 100,000 cells and a human with 3 trillion cells, intelligence arises. Describing humans as merely multicellular organisms and using hydra as your point of reference isn't going to get you very far.
Here's a fun example of that kind of "I've updated my statements but not assessed any of my underlying lack of understanding" - it's a bad look on any kind of scientist.
This is all true, and I'd also add that LeCun has the classic pundit problem of making his opposition to another group too much of his identity, to the detriment of his thinking. So much of his persona and ego is tied up in being a foil to both Silicon Valley hype-peddlers and AI doomers that he's more interested in dunking on them than being correct. Not that those two groups are always right either, but when you're more interested in getting owns on Twitter than having correct thinking, your predictions will always suffer for it.
That's why I'm not too impressed even when he has changed his mind: he has admitted to individual mistakes, but not to the systemic issues which produced them, which makes for a safe bet that there will be more mistakes in the future.
“Changing your mind” doesn’t really look like what LeCun is doing.
If your model of reality makes good predictions and mine makes bad ones, and I want a more accurate model of reality, then I really shouldn’t just make small provisional and incremental concessions gerrymandered around whatever the latest piece of evidence is. After a few repeated instances, I should probably just say “oops, looks like my model is wrong” and adopt yours.
This seems to be a chronic problem with AI skeptics of various sorts. They clearly tell us that their grand model indicates that such-and-such a quality is absolutely required for AI to achieve some particular thing. Then LLMs achieve that thing without having that quality. Then they say something vague about how maybe LLMs have that quality after all, somehow. (They are always shockingly incurious about explaining this part. You would think this would be important to them to understand, as they tend to call themselves “scientists”.)
They never take the step of admitting that maybe they’re completely wrong about intelligence, or that they’re completely wrong about LLMs.
Here’s one way of looking at it: if they had really changed their mind, then they would stop being consistently wrong.
He hasn't fundamentally changed his mind. What he's doing is taking what he fundamentally believes and finding more and more elaborate ways of justifying it.
When you limit to one framing "changing one's mind", it helps if you point it out, acknowledging that other framings can be possible, otherwise it risks seeming (not necessarily being) manipulative, and you are at least overlooking a large part of the domain. Harvard Decision group called these two of the most insidious drivers of poor decisions "frame blindness" and poor "frame choice". Give more than one frame a chance.
Because he has a core belief and based on that core belief he made some statements that turned out to be incorrect. But he kept the core belief and adjusted the statements.
So it's not so much about his incorrect predictions, but that these predictions were based on a core belief. And when the predictions turned out to be false, he didn't adjust his core beliefs, but just his predictions.
So it's natural to ask, if none of the predictions you derived from your core belief come true, maybe your core belief isn't true.
I have not followed all of LeCun's past statements, but -
if the "core belief" is that the LLM architecture cannot be the way to AGI, that is more of an "educated bet", which does not get falsified when LLMs improve but still suggest their initial faults. If seeing that LLMs seem constrained in the "reactive system" as opposed to a sought "deliberative system" (or others would say "intuitive" vs "procedural" etc.) was an implicit part of the original "core belief", then it still stands in spite of other improvements.
If you say LLMs are a dead end, and you give a few examples of things they will never be able to do, and a few months later they do it, and you just respond by stating that sure they can do that but they're still a dead end and won't be able to do this.
Rinse and repeat.
After a while you question whether LLMs are actually a dead end
1 reply →
If you need basically rock solid evidence of X before you stop saying "this thing cannot do X", then you shouldn't be running a forward looking lab. There are only so many directions you can take, only so many resources at your disposal. Your intuition has to be really freakishly good to be running such a lab.
He's done a lot of amazing work, but his stance on LLMs seems continuously off the mark.
The list of great minds who thought that "new fangled thing is nonsense" and later turned out to be horribly wrong is quite long and distinguished
> Heavier-than-air flying machines are impossible.
-Lord Kelvin. 1895
> I think there is a world market for maybe five computers. Thomas Watson, IBM. 1943
> On talking films: “They’ll never last.” -Charlie Chaplin.
> This ‘telephone’ has too many shortcomings… -William Orton, Western Union. 1876
> Television won’t be able to hold any market -Darryl Zanuck, 20th Century Fox. 1946
> Louis Pasteur’s theory of germs is ridiculous fiction. -Pierre Pachet, French physiologist.
> Airplanes are interesting toys but of no military value. — Marshal Ferdinand Foch 1911
> There’s no chance the iPhone is going to get any significant market share. — Steve Ballmer, CEO Microsoft CEO. 2007
> Stocks have reached a permanently high plateau. — Irving Fisher, Economist. 1929
> Who the hell wants to hear actors talk? —Harry Warner, Warner Bros. 1927
> By 2005, it will become clear that the Internet’s impact on the economy has been no greater than the fax machine. -Paul Krugman, Economist. 1998
5 replies →
*formerly great minds.
In many cases the folks in question were waaaaay past their best days.
I doubt that list is as long as the great minds that glommed onto a new tech that turned out to be a dud, but I could be wrong. It's an interesting question, but each tech needs to be evaluated separately.
I'm going to wear the tinfoil hat: a firm is able to produce a sought-after behavior a few months later and throws people off. Is it more likely that the firm (worth billions at this point) is engineering these solutions into the model, or is it because of emergent neural network architectural magic?
I'm not saying that they are being bad actors, just saying this is more probable in my mind than an LLM breakthrough.
It depends what you mean by "engineering these solutions into the model". Using better data leads to better models given the same architecture and training. Nothing wrong with it, it's hard work, it might be with as specific goal in mind. LLM "breakthroughs" aren't really a thing at this point. It's just one little thing after another.
2 replies →
Because there were plenty of evidences that the statements were either not correct or not based on enough information, at the time they were made. And to be wrong because of personal biases, and then don't clearly state you were wrong when new evidenced appeared, is not a trait of a good scientist. For instance: the strong summarization abilities where already something that, alone, without any further information, were enough to seriously doubt about the stochastic parrot mental model.
I don't see the contradiction between "stochastic parrot" and "strong summarisation abilities".
Where I'm skeptical of LLM skepticism is that people use the term "stochastic parrot" disparagingly, as if they're not impressed. LLMs are stochastic parrots in the sense that they probabilistically guess sequences of things, but isn't it interesting how far that takes you already? I'd never have guessed. Fundamentally I question the intellectual honesty of anyone who pretends they're not surprised by this.
LLMs learn from examples where the logits are not probabilities, but how a given sentence continues (only one token is set to 1). So they don't learn probabilities, they learn how to continue the sentence with a given token. We apply softmax at the logits for mathematical reasons, and it is natural/simpler to think in terms of probabilities, but that's not what happens, nor the neural networks they are composed of is just able to approximate probabilistic functions. This "next token" probability is the source of a lot misunderstanding. It's much better to imagine the logits as "To continue my reply I could say this word, more than the others, or maybe that one, a bit less, ..." and so forth. Now there are evidences, too, that in the activations producing a given token the LLM already has an idea about how most of the sentence is going to continue.
Of course, as they learn, early in the training, the first functions they will model, to lower the error, will start being the probabilities of the next tokens, since this is the simplest function that works for the loss reduction. Then gradients agree in other directions, and the function that the LLM eventually learn is no longer related to probabilities, but to the meaning of the sentence and what it makes sense to say next.
It's not be chance that often the logits have a huge signal in just two or three tokens, even if the sentence, probabilistically speaking, could continue in much more potential ways.
11 replies →
There are some that would describe LLMs as next word predictors, akin to having a bag of magnetic words, where you put your hand in, rummage around, and just pick a next word and put it on the fridge and eventually form sentences. It's "just" predicting the next word, so as an analogy as to how they work, that seems reasonable. The thing is, when that bag consists of a dozen bags-in-bags, like Russian nesting dolls, and the "bag" has a hundred million words in it, the analogy stops being a useful description. It's like describing humans as multicellular organisms. It's an accurate description of what a human is, but somewhere between a simple hydra with 100,000 cells and a human with 3 trillion cells, intelligence arises. Describing humans as merely multicellular organisms and using hydra as your point of reference isn't going to get you very far.
Here's a fun example of that kind of "I've updated my statements but not assessed any of my underlying lack of understanding" - it's a bad look on any kind of scientist.
https://x.com/AukeHoekstra/status/1507047932226375688
Who are you referring to?
1 reply →
> strong summarization abilities
Which LLMs have shown you "strong summarization abilities"?
This is all true, and I'd also add that LeCun has the classic pundit problem of making his opposition to another group too much of his identity, to the detriment of his thinking. So much of his persona and ego is tied up in being a foil to both Silicon Valley hype-peddlers and AI doomers that he's more interested in dunking on them than being correct. Not that those two groups are always right either, but when you're more interested in getting owns on Twitter than having correct thinking, your predictions will always suffer for it.
That's why I'm not too impressed even when he has changed his mind: he has admitted to individual mistakes, but not to the systemic issues which produced them, which makes for a safe bet that there will be more mistakes in the future.
“Changing your mind” doesn’t really look like what LeCun is doing.
If your model of reality makes good predictions and mine makes bad ones, and I want a more accurate model of reality, then I really shouldn’t just make small provisional and incremental concessions gerrymandered around whatever the latest piece of evidence is. After a few repeated instances, I should probably just say “oops, looks like my model is wrong” and adopt yours.
This seems to be a chronic problem with AI skeptics of various sorts. They clearly tell us that their grand model indicates that such-and-such a quality is absolutely required for AI to achieve some particular thing. Then LLMs achieve that thing without having that quality. Then they say something vague about how maybe LLMs have that quality after all, somehow. (They are always shockingly incurious about explaining this part. You would think this would be important to them to understand, as they tend to call themselves “scientists”.)
They never take the step of admitting that maybe they’re completely wrong about intelligence, or that they’re completely wrong about LLMs.
Here’s one way of looking at it: if they had really changed their mind, then they would stop being consistently wrong.
He hasn't fundamentally changed his mind. What he's doing is taking what he fundamentally believes and finding more and more elaborate ways of justifying it.
When you limit to one framing "changing one's mind", it helps if you point it out, acknowledging that other framings can be possible, otherwise it risks seeming (not necessarily being) manipulative, and you are at least overlooking a large part of the domain. Harvard Decision group called these two of the most insidious drivers of poor decisions "frame blindness" and poor "frame choice". Give more than one frame a chance.