Comment by mike_hearn
2 years ago
We're really not that far. I'd argue superintelligence has already been achieved, and it's perfectly and knowably safe.
Consider, GPT-4o or Claude are:
• Way faster thinkers, readers, writers and computer operators than humans are
• Way better educated
• Way better at drawing/painting
... and yet, appear to be perfectly safe because they lack agency. There's just no evidence at all that they're dangerous.
Why isn't this an example of safe superintelligence? Why do people insist on defining intelligence in only one rather vague dimension (being able to make cunning plans).
Yann LeCun said it best in an interview with Lex Friedman.
LLMs don't consume more energy when answering more complex questions. That means there's no inherent understanding of questions.
(which you could infer from their structure: LLMs recursively predict the next word, possibly using words they just predicted, and so on).
LLMs don't consume more energy when answering more complex questions.
They can. With speculative decoding (https://medium.com/ai-science/speculative-decoding-make-llm-...) there's a small fast model that makes the initial prediction for the next token, and a larger slower model that evaluates that prediction, accepts it if it agrees, and reruns it if not. So a "simple" prompt for which the small and large models give the same output will run faster and consume less energy than a "complex" prompt for which the models often disagree.
I don't think speculative decoding proves that they consume less/more energy per question.
Regardless if the question/prompt is simple or not (for any definition of simple), if the target output is T tokens, the larger model needs to generate at least T tokens, if the small and large models disagree then the large model will be called to generate more than T tokens. The observed speedup is because you can infer K+1 tokens in parallel based on the drafts of the smaller model instead of having to do it sequentially. But I would argue that the "important" computation is still done (also the smaller model will be called the same number of times regardless of the difficulty of the question, bringing us back to the same problem that LLMs won't vary their energy consumption dynamically as a function of question complexity).
Also, the rate of disagreement does not necessarily change when the question is more complex, it could be that the 2 models have learned different things and could disagree on a "simple" question.
Or alternatively a lot of energy is wasted answering simple questions.
The whole point of the transformer is to take words and iteratively, layer by layer, use the context to refine their meaning. The vector you get out is a better representation of the true meaning of the token. I’d argue that’s loosely akin to ‘understanding’.
The fact that the transformer architecture can memorize text is far more surprising to me than the idea that it might understand tokens.
LLMs do consume more energy for complex questions. That's the original CoT insight. If you give them the space to "think out loud" their performance improves.
The current mainstream models don't really incorporate that insight into the core neural architectures as far as anyone knows, but there are papers that explore things like pause tokens which let the model do more computation without emitting words. This doesn't seem like a fundamental limitation let alone something that should be core to the definition of intelligence.
After all, to my eternal sadness humans don't seem to use more energy to answer complex questions either. You can't lose weight by thinking about hard stuff a lot, even though it'd be intuitive that you can. Quite the opposite. People who sit around thinking all day tend to put on weight.
> Way faster thinkers, readers, writers and computer operators than humans are
> Way better educated
> Way better at drawing/painting
I mean this nicely, but you have fallen for the anthropomorphizing of LLMs by marketing teams.
None of this is "intelligent", rather it's an incredibly sophisticated (and absolutely beyond human capabilities) lookup and classification of existing information.
And I am not arguing that this has no value, it has tremendous value, but it's not superintelligence in any sense.
LLMs do not "think".
Yeah well, sorry, but I have little patience anymore for philosophical word games. My views are especially not formed by marketing teams: ChatGPT hardly has one. My views are formed via direct experience and paper reading.
Imagine going back in time five years and saying "five years from now there will be a single machine that talks like a human, can imagine creative new artworks, write Supreme Court judgements, understand and display emotion, perform music and can engage in sophisticated enough reasoning to write programs. Also, HN posters will claim it's not really intelligent". Everyone would have laughed. They'd think you were making a witticism about the way people reclassify things as not-really-AI the moment they actually start to work well, a well known trope in the industry. They wouldn't have thought you were making a prediction of the future.
At some point, what matters is outcomes. We have blown well past the point of super-intelligent outcomes. I really do not care if GPT-4o "thinks" or does not "think". I can go to chatgpt.com right now and interact with something that is for all intents and purposes indistinguishable from super-intelligence ... and everything is fine.