Comment by cosmic_quanta

2 years ago

Yann LeCun said it best in an interview with Lex Friedman.

LLMs don't consume more energy when answering more complex questions. That means there's no inherent understanding of questions.

(which you could infer from their structure: LLMs recursively predict the next word, possibly using words they just predicted, and so on).

LLMs don't consume more energy when answering more complex questions.

They can. With speculative decoding (https://medium.com/ai-science/speculative-decoding-make-llm-...) there's a small fast model that makes the initial prediction for the next token, and a larger slower model that evaluates that prediction, accepts it if it agrees, and reruns it if not. So a "simple" prompt for which the small and large models give the same output will run faster and consume less energy than a "complex" prompt for which the models often disagree.

  • I don't think speculative decoding proves that they consume less/more energy per question.

    Regardless if the question/prompt is simple or not (for any definition of simple), if the target output is T tokens, the larger model needs to generate at least T tokens, if the small and large models disagree then the large model will be called to generate more than T tokens. The observed speedup is because you can infer K+1 tokens in parallel based on the drafts of the smaller model instead of having to do it sequentially. But I would argue that the "important" computation is still done (also the smaller model will be called the same number of times regardless of the difficulty of the question, bringing us back to the same problem that LLMs won't vary their energy consumption dynamically as a function of question complexity).

    Also, the rate of disagreement does not necessarily change when the question is more complex, it could be that the 2 models have learned different things and could disagree on a "simple" question.

Or alternatively a lot of energy is wasted answering simple questions.

The whole point of the transformer is to take words and iteratively, layer by layer, use the context to refine their meaning. The vector you get out is a better representation of the true meaning of the token. I’d argue that’s loosely akin to ‘understanding’.

The fact that the transformer architecture can memorize text is far more surprising to me than the idea that it might understand tokens.

LLMs do consume more energy for complex questions. That's the original CoT insight. If you give them the space to "think out loud" their performance improves.

The current mainstream models don't really incorporate that insight into the core neural architectures as far as anyone knows, but there are papers that explore things like pause tokens which let the model do more computation without emitting words. This doesn't seem like a fundamental limitation let alone something that should be core to the definition of intelligence.

After all, to my eternal sadness humans don't seem to use more energy to answer complex questions either. You can't lose weight by thinking about hard stuff a lot, even though it'd be intuitive that you can. Quite the opposite. People who sit around thinking all day tend to put on weight.