Comment by notepad0x90
5 days ago
I don't get why you would say that. it's just auto-completing. It cannot reason. It won't solve an original problem for which it has no prior context to "complete" an approximated solution with. you can give it more context and more data,but you're just helping it complete better. it does not derive an original state machine or algorithm to solve problems for which there are no obvious solutions. it instead approximates a guess (hallucination).
Consciousness and self-awareness are a distraction.
Consider that for the exact same prompt and instructions, small variations in wording or spelling change its output significantly. If it thought and reasoned, it would know to ignore those and focus on the variables and input at hand to produce deterministic and consistent output. However, it only computes in terms of tokens, so when a token changes, the probability of what a correct response would look like changes, so it adapts.
It does not actually add 1+2 when you ask it to do so. it does not distinguish 1 from 2 as discrete units in an addition operation. but it uses descriptions of the operation to approximate a result. and even for something so simple, some phrasings and wordings might not result in 3 as a result.
> It won't solve an original problem for which it has no prior context to "complete" an approximated solution with.
Neither can humans. We also just brute force "autocompletion" with our learned knowledge and combine it to new parts, which we then add to our learned knowledge to deepen the process. We are just much, much better at this than AI, after some decades of training.
And I'm not saying that AI is fully there yet and has solved "thinking". IMHO it's more "pre-thinking" or proto-intelligence.. The picture is there, but the dots are not merging yet to form the real picture.
> It does not actually add 1+2 when you ask it to do so. it does not distinguish 1 from 2 as discrete units in an addition operation.
Neither can a toddler nor an animal. The level of ability is irrelevant for evaluating its foundation.
> Neither can humans. We also just brute force "autocompletion"
I have to disagree here. When you are tasked with dividing 2 big numbers you most certainly don't "autocomplete" (with the sense of finding the most probable next tokens, which is what an LLM does), rather you go through set of steps you have learned. Same as with the strawberry example, you're not throwing guesses until something statistically likely to be correct sticks.
Humans first start with recognizing the problem, then search through their list of abilities to find the best skill for solving it, thus "autocomplete" their inner shell's commandline, before they start execution, to stay with that picture. Common AIs today are not much different from this, especially with reasoning-modes.
> you're not throwing guesses until something statistically likely to be correct sticks.
What do you mean? That's exactly how many humans are operating with unknown situations/topics. If you don't know, just throw punches and look what works. Of course, not everyone is ignorant enough to be vocal about this in every situation.
> I have to disagree here. When you are tasked with dividing 2 big numbers you most certainly don't "autocomplete" (with the sense of finding the most probable next tokens, which is what an LLM does), rather you go through set of steps you have learned.
Why do you think that this is the part that requires intelligence, rather than a more intuitive process? Because they have had machines that can do this mechanically for well over a hundred years.
There is a whole category of critiques of AI of this type: "Humans don't think this way, they mechanically follow an algorithm/logic", but computers have been able to mechanically follow algorithms and perform logic from the beginning! That isn't thinking!
1 reply →
humans, and even animals track different "variables" or "entities" and distinct things with meaning and logical properties which they then apply some logical system on those properties to compute various outputs. LLMs see everything as one thing, in case of chat-completion models, they're completing text. in case of image generation, they're completing an image.
Look at it this way, two students get 100% on an exam. One learned the probability of which multiple choice options have the likelihood of being most correct based on how the question is worded, they have no understanding of the topics at hand, and they're not performing any sort of topic-specific reasoning. They're just good at guessing the right option. The second student actually understood the topics, reasoned, calculated and that's how they aced the exam.
I recently read about a 3-4 year old that impressed their teacher by reading perfectly a story book like an adult. it turns out, their parent read it to them so much, they can predict based on page turns and timing the exact words that need to be spoken. The child didn't know what an alphabet, word,etc.. was they just got so good at predicting the next sequence.
That's the difference here.
I'd say, they are all doing the same, just in different domains and level of quality. "Understanding the topic" only means they have specialized, deeper contextualized information. But at the end, that student also just autocompletes their memorized data, with the exception that some of that knowledge might trigger a program they execute to insert the result in their completion.
The actual work is in gaining the knowledge and programs, not in accessing and executing them. And how they operate, on which data, variables, objects, worldview or whatever you call it, this might make a difference in quality and building speed, but not for the process in general.
3 replies →
>>> We also just brute force "autocompletion"
Wouldn't be an A.I. discussion without a bizarre, untrue claim that the human brain works identically.
There are no true and untrue claims about how the brain works, because we have no idea how it works.
The reason people give that humans are not auto-complete is "Obviously I am not an autocomplete"
Meanwhile, people are just a black box process that output words into their head, which they then take credit for, and calling it cognition. We have no idea how that black box that serves up a word when I say "Think of a car brand" works.
11 replies →
Similar, not identical. Like a bicycle and car are both vehicles with tires, but are still not identical vessels.
> We also just brute force "autocompletion" with our learned knowledge and combine it to new parts, which we then add to our learned knowledge to deepen the process
you know this because you're a cognitive scientist right? or because this is the consensus in the field?
>Neither can a toddler nor an animal. The level of ability is irrelevant for evaluating its foundation.
Its foundation of rational logical thought that can't process basic math? Even a toddler understands 2 is more than 1.
An LLM by itself is not thinking, just remembering and autocompleting. But if you add a feedback loop where it can use tools, investigate external files or processes, and then autocomplete on the results, you get to see something that is (close to) thinking. I've seen claude code debug things by adding print statements in the source and reasoning on the output, and then determining next steps. This feedback loop is what sets AI tools apart, they can all use the same LLM, but the quality of the feedback loop makes the difference.
>But if you add a feedback loop where it can use tools, investigate external files or processes, and then autocomplete on the results, you get to see something that is (close to) thinking
It's still just information retrieval. You're just dividing it into internal information (the compressed representation of the training data) and external information (web search, API calls to systems, etc). There is a lot of hidden knowledge embedded in language and LLMs do a good job of teasing it out that resembles reasoning/thinking but really isn't.
No, it's more than information retrieval. The LLM is deciding what information needs to be retrieved to make progress and how to retrieve this information. It is making a plan and executing on it. Plan, Do, Check, Act. No human in the loop if it has the required tools and permissions.
> LLMs do a good job of teasing it out that resembles reasoning/thinking but really isn't.
Given the fact that "thinking" still hasn't been defined rigourously, I don't understand how people are so confident in claiming they don't think.
5 replies →
>>you get to see something that is (close to) thinking.
Isn't that still "not thinking"?
Depends who you ask, what their definition of thinking is.
Just ask it how many r's are in strawberry and you will realize there isn't a lot of reasoning going on here, it's just trickery on top of token generators.
This is akin to "Show a human an optical illusion that exploits their physiology".
LLM's be like "The dumb humans can't even see the dots"[1]
[1]https://compote.slate.com/images/bdbaa19e-2c8f-435e-95ca-a93...
1 reply →
Ask an illiterate person the same thing and they will fail badly too. Is it impossible to have intelligence without literacy? (Bonus: if so, how was writing invented?)
1 reply →
Furthermore regarding reasoning, just ask any LLM how many "r letters are in strawberry" - repeat maybe 3 times just to get a feeling for how much variance in answers you can get. And this "quirk" of the inability to get the right answer is something that after 2 years making fun of LLMs online on various forums is still an issue. The models aren't getting smarter, and definitely aren't thinking, they are still token generators with a few tricks on top to make them seem more intelligent than predecessors.
Sorry to say, Claude Sonnet 4.5 (with and without extended thinking) has this one solved now. So, not “any” LLM, but maybe some still need work.
I presume they finally got around to training the internal reasoning loop into predicting when it needs to apply answer predictions character-by-character.
—- Transcripts from Claude Sonnet 4.5, initially with extended thinking:
> How many a’s in the phrase “artichokes and aardvarks”? Be sure to check your answer.
> Let me count the letter ‘a’ in “artichokes and aardvarks”: • artichokes: 1 ‘a’ (at the beginning) • and: 1 ‘a’ (at the beginning) • aardvarks: 3 ‘a’s (two at the beginning: “aa”, and one in the middle: “ardvarks”) Total: 1 + 1 + 3 = 5 ‘a’s Let me verify by marking them: artichokes and aardvarks Yes, there are 5 letter ‘a’s in the phrase “artichokes and aardvarks”.
> Are you sure that’s correct?
> Let me recount more carefully: “artichokes and aardvarks” Going through each word letter by letter: 1. artichokes - has 1 ‘a’ (first letter) 2. and - has 1 ‘a’ (first letter) 3. aardvarks - has 3 ‘a’s (first letter, second letter, and one more in the middle) Total: 1 + 1 + 3 = 5 ‘a’s Yes, I’m confident the answer is 5. Is there something specific you’d like me to recheck?
> How many r’s in strawberry?
> Let me count the letter ‘r’ in “strawberry”: s-t-r-a-w-b-e-r-r-y Going through each letter: • s: not an r • t: not an r • r: 1st r • a: not an r • w: not an r • b: not an r • e: not an r • r: 2nd r • r: 3rd r • y: not an r There are 3 r’s in “strawberry”.
<Without extended thinking> > How many u’s in universalulis?
> Looking at “universalulis” letter by letter: u-n-i-v-e-r-s-a-l-u-l-i-s There are 2 u’s in “universalulis”.
For what it's worth, when I ask ChatGPT 5, it gets the correct answer every time. The response varies, but the answer is always three.
Now try a different language. My take is hard RL tuning to fix these "gotcha:s" since the underlying model can't do it on its own.
OpenAI is working on ChatGPT the application and ecosystem. They have transitioned from model building to software engineering with RL tuning and integration of various services to solve the problems the model can't do on its own. Make it feel smart rather than be smart.
This means that as soon as you find a problem where you step out of the guided experience you get the raw model again which fails when encountering these "gotchas".
Edit - Here's an example where we see a very tuned RL experience in English where a whole load of context is added on how to solve the problem while the Swedish prompt for the same word fails.
https://imgur.com/a/SlD84Ih
3 replies →
> it's just auto-completing. It cannot reason
Auto completion just means predicting the next thing in a sequence. This does not preclude reasoning.
> I don't get why you would say that.
Because I see them solve real debugging problems talking through the impact of code changes or lines all the time to find non-obvious errors with ordering and timing conditions on code they’ve never seen before.
> This does not preclude reasoning.
It does not imply it either. to claim reasoning you need evidence. it needs to reliably NOT hallucinate results for simple conversations for example (if it has basic reasoning).
> Because I see them solve real debugging problems talking through the impact of code changes or lines all the time to find non-obvious errors with ordering and timing conditions on code they’ve never seen before.
Programming languages and how programs work are extensively and abundantly documented, solutions to problems and how to approach them,etc.. have been documented on the internet extensively. It takes all of that data and it completes the right text by taking the most correct path way based on your input. it does not actually take your code and debug it. it is the sheer volume of data it uses and the computational resources behind it that are making it hard to wrap your head around the difference between guessing and understanding. You too can look at enough stack overflow and (poorly) guess answers for questions without understanding anything about the topic and if you guess enough you'll get some right. LLMs are just optimized to get the amount of correct responses to be high.
> It does not imply it either.
Right, it's irrelevant to the question of whether they can reason.
> to claim reasoning you need evidence
Frankly I have no idea what most people are talking about when they use the term and say these models can't do it. It seems to be a similarly hand-wavey exercise as when people talk about thinking or understanding.
> it needs to reliably NOT hallucinate results for simple conversations for example (if it has basic reasoning).
That's not something I commonly see in frontier models.
Again this doesn't seem related to reasoning. What we call hallucinations would be seen in something that could reason but had a fallible memory. I remember things incorrectly and I can reason.
> it does not actually take your code and debug it
It talks through the code (which it has not seen) and process step by step, can choose to add logging, run it, go through the logs, change what it thinks is happening and repeat. It can do this until it explains what is happening, creates test cases to show the problem and what triggers it, fixes it and shows the tests pass.
If that's not debugging the code I really don't know what to call it.
>I don't get why you would say that. it's just auto-completing. It cannot reason. It won't solve an original problem for which it has no prior context to "complete" an approximated solution with. you can give it more context and more data,but you're just helping it complete better. it does not derive an original state machine or algorithm to solve problems for which there are no obvious solutions. it instead approximates a guess (hallucination).
I bet you can't give an example such written problem that a human can easily solve but no LLM can.
> I don't get why you would say that.
Because it's hard to imagine the sheer volume of data it's been trained on.
And because ALL the marketing AND UX around LLMs is precisely trying to imply that they are thinking. It's not just the challenge of grasping the ridiculous amount of resources poured in, which does including training sets, it's because actual people are PAID to convince everybody those tools are actually thinking. The prompt is a chatbox, the "..." are there like a chat with a human, the "thinking" word is used, the "reasoning" word is used, "hallucination" is used, etc.
All marketing.
You're right. Unfortunately, it seems that not many are willing to admit this and be (rightly) impressed by how remarkably effective LLMs can be, at least for manipulating language.
The vast majority of human “thinking” is autocompletion.
Any thinking that happens with words is fundamentally no different to what LLMs do, and everything you say applies to human lexical reasoning.
One plus one equals two. Do you have a concept of one-ness, or two-ness, beyond symbolic assignment? Does a cashier possess number theory? Or are these just syntactical stochastic rules?
I think the problem here is the definition of “thinking”.
You can point to non-verbal models, like vision models - but again, these aren’t hugely different from how we parse non-lexical information.
> Any thinking that happens with words is fundamentally no different from what LLMs do.
This is such a wildly simplified and naive claim. "Thinking with words" happens inside a brain, not inside a silicon circuit with artificial neurons bolted in place. The brain is plastic, it is never the same from one moment to the next. It does not require structured input, labeled data, or predefined objectives in order to learn "thinking with words." The brain performs continuous, unsupervised learning from chaotic sensory input to do what it does. Its complexity and efficiency are orders of magnitude beyond that of LLM inference. Current models barely scratch the surface of that level of complexity and efficiency.
> Do you have a concept of one-ness, or two-ness, beyond symbolic assignment?
Obviously we do. The human brain's idea of "one-ness" or "two-ness" is grounded in sensory experience — seeing one object, then two, and abstracting the difference. That grounding gives meaning to the symbol, something LLMs don't have.
The instantiation of models in humans is not unsupervised, and language, for instance, absolutely requires labelled data and structured input. The predefined objective is “expand”.
See also: feral children.
1 reply →
LLMs are increasingly trained on images for multi-modal learning, so they too would have seen one object, then two.
5 replies →
We do a lot of autocompletion and LLMs overlap with that for sure. I don't know about the "vast majority" even basic operations like making sure we're breathing or have the right hormones prompted are not guesses but deterministic algorithmic ops. Things like object recognition and speech might qualify as autocompletion. But let's say you need to setup health-monitoring for an application. that's not an autocomplete operation. you must evaluate various options, have opinions on it, weigh priorities,etc.. in other words, we do autocompletion but even then the autocompletion is a basic building block or tool we use in constructing more complex decision logic.
If you train an animal to type the right keys on a keyboard that generates a hello world program, you didn't just teach them how to code. they just memorized the right keys that lead to their reward. a human programmer understands the components of the code, the intent and expectations behind it, and can reason about how changes would affect outcomes. the animal just knows how the reward can be obtained most reliably.
Sure. But neither do you. So are you really thinking or are you just autocompleting?
When was the last time you sat down and solved an original problem for which you had no prior context to "complete" an approximated solution with? When has that ever happened in human history? All the great invention-moment stories that come to mind seem to have exactly that going on in the background: Prior context being auto-completed in an Eureka! moment.
I think (hah) you're understimating what goes on when living things (even small animals) think. We use auto-compleition for some tasks, but it is a component of what we do.
Let's say your visual system auto-completes some pattern and detects a snake while you're walking, that part is auto-completion. You will probably react by freezing or panicing, that part is not auto-compleition, it is a deterministic algorithm. But then you process the detected object, auto-compleiting again to identify it as just a long cucumber. But again, the classification part is auto-completion. What will you do next? "Hmm, free cucumber, i can cook with it for a meal" and you pick it up. auto-completion is all over that simple decision, but you're using results of auto-completion to derive association (food), check your hunger level (not auto-completion), determine that the food is desirable and safe to eat (some auto-compleition), evalute what other options you have for food (evaluate auto-complete outputs), and then instruct your nervous system to pick it up.
We use auto-compleition all the time as an input, we don't reason using auto-compleition in other words. You can argue that if all your input is from auto-completion (it isn't) then it makes no difference. But we have deterministic reasoning logical systems that evaluate auto-completion outputs. if your cucumber detection identified it as rotten cucumber, then decision that it is not safe to eat is not done by auto-completion but a reasoning logic that is using auto-completion output. You can approximate the level of rot but once you recognize it as rotten, you make decision based on that information. You're not approximating a decision, you're evaluating a simple logic of: if(safe()){eat();}.
Now amp that up to solving very complex problems. what experiments will you run, what theories will you develop, what R&D is required for a solution,etc.. these too are not auto-completions. an LLM would auto complete these and might arrive at the same conclusion most of the time. but our brains are following algorithms we developed and learned over time where an LLM is just expanding on auto-completion but with a lot more data. In contrast, our brains are not trained on all the knowledge available on the public internet, we retain a tiny miniscule of that. we can arrive at similar conclusions as the LLM because we are reasoning and following algorithms matured and perfected over time.
The big take away should be that, as powerful as LLMs are now, if they could reason like we do, they'd dominate us and become unstoppable. Because their auto-completion is many magnitudes better than ours, if they can write new and original code based on an understanding of problem solving algorithms, that would be gen ai.
We can not just add 1 + 1 but prove that the addition operation is correct mathematically. and understand that when you add to a set one more object, the addition operation always increments. We don't approximate that, we always, every single time , increment because we are following an algorithm instead of choosing the most likely correct answer.
> don't get why you would say that. it's just auto-completing.
https://en.wikipedia.org/wiki/Predictive_coding
> If it thought and reasoned, it would know to ignore those and focus on the variables and input at hand to produce deterministic and consistent output
You only do this because you were trained to do this, eg. to see symmetries and translations.
You wrote your comment one word at a time, with the next word depending on the previous words written.
You did not plan the entire thing, every word, ahead of time.
LLMs do the same thing, so... how is your intelligence any different?
A long time ago I noticed that I sometimes already had a complete thought before my inner monologue turned it into words. A few times I tried skipping the inner monologue because I'd clearly already thought the thought. Turns out the bit of my brain that creates the inner monologue from the thought, can generate a sense of annoyance that the rest of my brain can feel.
Not that it matters, there's evidence that while LLMs output one word at a time, they've got forward-planning going on, having an idea of the end of a sentence before they get there.
Indeed, and it seems like they would really struggle to output coherent text at all if there was not some kind of pre-planning involved (see how even humans struggle with it in games where you have to construct a sentance by having each person shout out one word at a time). Even GPT-2 likely had at least some kind of planning for the next few words in order to be as coherent as it was.
Tell that to German-speakers, where the verb comes last, and the order of things in sentences is not anything like English, therefore requiring you to think of the entire sentence before you just spit it out. Even the numbers are backwards (twenty-two is two-and-twenty) which requires thinking.
Furthermore, when you ask an LLM to count how many r's are in the word strawberry, it will give you a random answer, "think" about it, and give you another random answer. And I guarantee you out of 3 attempts, including reasoning, it will flip-flop between right and wrong, but unlike a human, it will be random, because, unlike humans who, when asked "how many r's are in the word strawberry" will not be able to tell you the correct answer every. fucking. time.
edit: formatting
It seems models are pre-planning though:
> How does Claude write rhyming poetry? Consider this ditty:
> He saw a carrot and had to grab it,
> His hunger was like a starving rabbit
> To write the second line, the model had to satisfy two constraints at the same time: the need to rhyme (with "grab it"), and the need to make sense (why did he grab the carrot?). Our guess was that Claude was writing word-by-word without much forethought until the end of the line, where it would make sure to pick a word that rhymes. We therefore expected to see a circuit with parallel paths, one for ensuring the final word made sense, and one for ensuring it rhymes.
> Instead, we found that Claude plans ahead. Before starting the second line, it began "thinking" of potential on-topic words that would rhyme with "grab it". Then, with these plans in mind, it writes a line to end with the planned word.
[https://www.anthropic.com/research/tracing-thoughts-language...]
The part about strawberry is just not right. That problem was solved. And I do think it's a stretch to say German speakers think of the entire sentence before speaking it.
1 reply →