Comment by shawntan
1 year ago
If we're trying to quantify what they can NEVER do, I think we'd have to resort to some theoretical results rather than a list empirical evidence of what they can't do now. The terminology you'd look for in the literature would be "expressibility".
For a review of this topic, I'd suggest: https://nessie.ilab.sztaki.hu/~kornai/2023/Hopf/Resources/st...
The authors of this review have themselves written several articles on the topic, and there is also empirical evidence connected to these limitations.
This is also a good paper on the subject:
What Algorithms can Transformers Learn? A Study in Length Generalization https://arxiv.org/abs/2310.16028
Yes this is a good empirical study on the types of tasks that's been shown to be impossible for transformers to generalise on.
With both empirical and theoretical support I find it's pretty clear this is an obvious limitation.
We have to be a bit more honest about the things we can actually do ourselves. Most people I know would flunk most of the benchmarks we use to evaluate LLMs. Not just a little bit but more like completely and utterly and embarrassingly so. It's not even close; or fair. People are surprisingly alright at a narrow set of problems. Particularly when it doesn't involve knowledge. Most people also suck at reasoning (unless they had years of training), they suck at factual knowledge, they aren't half bad at visual and spatial reasoning, and fairly gullible otherwise.
Anyway, this list looks more like a "hold my beer" moment for AI researchers than any fundamental objections for AIs to stop evolving any further. Sure there are weaknesses, and paths to address those. Anyone claiming that this is the end of the road in terms of progress is going to be in for some disappointing reality check probably a lot sooner than is comfortable.
And of course by narrowing it to just LLMs, the authors have a bit of an escape hatch because they conveniently exclude any further architectures, alternate strategies, improvements, that might otherwise overcome the identified current weaknesses. But that's an artificial constraint that has no real world value; because of course AI researchers are already looking beyond the current state of the art. Why wouldn't they.
It's clear that what's missing is flexibility and agency. For anything that can be put into text or a short conversation, and I'd have to chose between access to ChatGPT or a random human, I know what I'd chose.
Agency is one of those things we probably want to think about quite a bit. Especially with the the willingness for people to hook up it up to things that interact with the real world.
Not sure what you got out of the paper, but for me it was more spurring ideas about how to fix this in future architectures.
Don't think anyone worth their salt would look at this and think : oh well that's that then.
Thank you for sharing this here. Rigorous work on the "expressibility" of current LLMs (i.e., which classes of problems can they tackle?) is surely more important, but I suspect it will go over head of most HN readers, many of whom have minimal to zero formal training on topics relating to computational complexity.
Yes, but unfortunately that doesn't answer the question the title poses.
The OP is not trying to answer the question. Rather, the OP is asking the question and sharing some thoughts on the motivations for asking it.
1 reply →
This is very interesting thanks Shawn. I did email William Merrill to see his thoughts but didn't get a response yet.
Neural nets can approximate any function.
A large enough llm with memory is turning complete.
So theoretically I don’t think there is anything they can never do.
> Neural nets can approximate any function.
Common misunderstanding of the universal approximation theorem.
Consider this: can an mlp approximate a sine wave?
> A large enough llm with memory is turning complete.
With (a lot of) chain of thought it could be.
Read the paper, and its references.
Sort of moot anyway. If statements can approximate any function, most programming languages are effectively turing complete. What's important about specific architectures like transformers is they allow for comparatively efficient determination of the set of weights that will approximate some narrower class of functions. It's finding the weights that's important, not the theoretical representation power.
"Consider this: can an mlp approximate a sine wave?"
Well, yes - we have neutral speech and music synthesis and compression algorithms which do this exceedingly well...
1 reply →
There's all sorts of things a neural net isn't doing without a body. Giving birth or free soloing El Capitan come to mind. It could approximate the functions for both in token-land, but who cares?