Comment by qnleigh
5 days ago
I might as well answer my own question, because I do think there are some coherent arguments for fundamental LLM limitations:
1. LLMs are trained on human-quality data, so they will naturally learn to mimic our limitations. Their capabilities should saturate at human or maybe above-average human performance.
2. LLMs do not learn from experience. They might perform as well as most humans on certain tasks, but a human who works in a certain field/code base etc. for long enough will internalize the relevant information more deeply than an LLM.
However I'm increasingly doubtful that these arguments are actually correct. Here are some counterarguments:
1. It may be more efficient to just learn correct logical reasoning, rather than to mimic every human foible. I stopped believing this argument when LLMs got a gold metal at the Math Olympiad.
2. LLMs alone may suffer from this limitation, but RL could change the story. People may find ways to add memory. Finally, it can't be ruled out that a very large, well-trained LLM could internalize new information as deeply as a human can. Maybe this is what's happening here:
https://metr.org/blog/2025-03-19-measuring-ai-ability-to-com...
I studied philosophy focusing on the analytic school and proto-computer science. LLMs are going to force many people start getting a better understanding about what "Knowledge" and "Truth" are, especially the distinction between deductive and inductive knowledge.
Math is a perfect field for machine learning to thrive because theoretically, all the information ever needed is tied up in the axioms. In the empirical world, however, knowledge only moves at the speed of experimentation, which is an entirely different framework and much, much slower, even if there are some areas to catch up in previous experimental outcomes.
Having a focus in philosophy of language is something I genuinely never thought would be useful. It’s really been helpful with LLMs, but probably not in the way most people think. I’d say that folks curious should all be reading Quine, Wittgenstein’s investigations, and probably Austin.
I think we may have similar perspectives. Regarding empirical knowledge, consider when the knowledge is in relation to chaotic systems. Characterize chaotic systems at least as systems where inaccurate observations about the system in the past and present while useful for predicting the future, nevertheless see the errors grow very quickly for the task of predicting a future state. Then indeed, prediction is difficult.
One domain of knowledge I think you have yet to mention. We can talk about fundamentally computationally hard problems. What comes to mind regarding such problems that are nevertheless of practical benefit are physics simulations, material simulations, fluid simulations, but there exist problems that are more provably computationally difficult. It seems to me that with these systems, the chaotic nature is one where even if you have one infinitely precise observation of a deterministic system, accessing a future state of the system is difficult as well, even though once accessed, memorization seems comparatively trivial.
Where can I read about how LLMs have changed epistemology? Is there a field of philosophy that tries to define and understand 'intelligence'? That sounds very interesting.
There is already philosophy of mind, but it was pretty young when I was in grad school, which was really at the dawn of deep learning algorithms.
I’d say the two most important topics here are philosophy of language (understanding meaning) and philosophy of science (understanding knowledge).
I’ve already mentioned the language philosophers in an edit above, but in philosophy of science I’d add Popper as extremely important here. The concept of negative knowledge as the foundation of empirical understanding seems entirely lost on people. The Black Swan, by Nassim Taleb is a very good casual read on the subject.
Also, we can do thought experiments, simulations in our heads, that often are as good as doing them for real - it has limitations and isn't perfect though. But it does work often. Einstein used to purposely dose off in a weird position so that something hit his leg or something like that to slightly nudge him half awake so he could remember his half-dreaming state - which is where he discovered some things
Any source on Einstein's behavior? Id love to read more.
> Math is a perfect field for machine learning to thrive because theoretically, all the information ever needed is tied up in the axioms.
Not really; the normal way that math progresses, just like everything else, is that you get some interesting results, and then you develop the theoretical framework. We didn't receive the axioms; we developed them from the results that we use them to prove.
Axioms are, again, by definition, arbitrary. It is effectively irrelevant that we try to develop axioms so that the framework mirror the real world. Everything falls out of the axioms, period.
If you want to change the axioms to better reflect some aspect about life, that's all well and good, but everything will still fall out of the new axioms.
4 replies →
> distinction between deductive and inductive knowledge
There's also intuitive knowledge btw.
Anyway, the recent developments of AI make a lot of very interesting things practically possible. For example, our society is going to want a way to reliably tell whether something is AI generated, and a failure to do so pretty much settles the empirical part of the Turing test issue. Or alternatively if we actually find something that AI can't reliably mimic in humans, that's going to be a huge finding. By having millions of people wonder whether posts on social media are AI generated, it is the largest scale Turing test we have inadvertently conducted.
The fact that AI seems to be able to (digitally) do anything we ask for is also very interesting. If humans are not bogged down by the small details or cost of implementation concerns, and we can just say what we want and get what we wished for (digitally), what level of creativity can we reach?
Also once we get the robots to do things in the physical space...
I don't want to do the thing where we fight on the internet. I don't know your background, but I'll push back here just because this type of comment that non-philosophers seem to present to me, which misses a lot of the points I'm trying to make.
(1) "intuitive knowledge" - whether or not you want to take "intuitive knowledge" as a type of knowledge (I don't think I would) is basically immaterial. The deductive-inductive framework dynamic is for reasoning frameworks, not knowledge. The reasoning frameworks are pointed in opposite directions. The deductive framework is inherited from rationalist tradition, it's premises are by definition arbitrary and cannot be justified, and information is perfect (excepting when you get rare truth values, like something being undecidable). Inductive/empirical framework is quite the opposite. Its premises are observations and absolutely not arbitrary, the information is wholly imperfect (by necessity, thanks Popper), and there is always a kind of adjustable resolution to any research conducted. Newton vs Einsteinian physics, for example, shows how zooming in on the resolution of experimentation shows how a perfectly workable model can fail when instruments get precise enough. I'll also note here that abduction is another niche reasoning framework, but is effectively immaterial to my point here.
(2) The Turing Test is not, and has never been, a philosophically rigorous test. It's effectively a pointless exercise. The literature about "philosophical zombies" has covered this, but the most important work here is Searle's "Chinese Room."
>The fact that AI seems to be able to (digitally) do anything we ask for is also very interesting.
I don't even know how to respond to this. It's trivially, demonstrably false. Beyond that, my entire point is that philosophy of language actually presents so hard problems with regards to what meaning actually is that might end up creating a kind of uncertainty principle to this line of thinking in the long run. Specifically Quine's indeterminacy of translation.
5 replies →
There are ways to go beyond the human-quality data limitation. AI can be trained on better quality than average human data because many problems are easy to verify their solutions. For example, in theory, reinforcement learning with an automatic grader on competitive programming problems can lead to an LLM that is better than humans at it.
It's also possible that there can be emergent capabilities. Perhaps a little obtuse, but you can say that humans are trained on human-quality data too and yet brilliant scientists and creative minds can rise above the rest of us.
> Their capabilities should saturate at human or maybe above-average human performance
LLMs do have superhuman reasoning speed and superhuman dedication. Speed is something you can scale, and at some point quantity can turn into quality. Much of the frontier work done by humans is just dedication, luck, and remixing other people's ideas ("standing on the shoulders of giants"), isn't it? All of this is exactly what you can scale by having restless hordes of fast-thinking agents, even if each of those agents is intellectually "just above average human".
> 1. LLMs are trained on human-quality data, so they will naturally learn to mimic our limitations. Their capabilities should saturate at human or maybe above-average human performance.
Why oh why is this such a commonly held belief. RL in verifiable domains being the way around this is the entire point. It’s the same idea behind a system like AlphaGo — human data is used only to get to a starting point for RL. RL will then take you to superhuman performance. I’m so confused why people miss this. The burden of proof is on people who claim that we will hit some sort of performance wall because I know of absolutely zero mechanisms for this to happen in verifiable domains.
The idea that they don’t learn from experience might be true in some limited sense, but ignores the reality of how LLMs are used. If you look at any advanced agentic coding system the instructions say to write down intermediate findings in files and refer to them. The LLM doesn’t have to learn. The harness around it allows it to. It’s like complaining that an internal combustion engine doesn’t have wheels to push it around.