Comment by madaxe_again

5 days ago

The vast majority of human “thinking” is autocompletion.

Any thinking that happens with words is fundamentally no different to what LLMs do, and everything you say applies to human lexical reasoning.

One plus one equals two. Do you have a concept of one-ness, or two-ness, beyond symbolic assignment? Does a cashier possess number theory? Or are these just syntactical stochastic rules?

I think the problem here is the definition of “thinking”.

You can point to non-verbal models, like vision models - but again, these aren’t hugely different from how we parse non-lexical information.

10 comments

madaxe_again

gloosx 4 days ago

> Any thinking that happens with words is fundamentally no different from what LLMs do.

This is such a wildly simplified and naive claim. "Thinking with words" happens inside a brain, not inside a silicon circuit with artificial neurons bolted in place. The brain is plastic, it is never the same from one moment to the next. It does not require structured input, labeled data, or predefined objectives in order to learn "thinking with words." The brain performs continuous, unsupervised learning from chaotic sensory input to do what it does. Its complexity and efficiency are orders of magnitude beyond that of LLM inference. Current models barely scratch the surface of that level of complexity and efficiency.

> Do you have a concept of one-ness, or two-ness, beyond symbolic assignment?

Obviously we do. The human brain's idea of "one-ness" or "two-ness" is grounded in sensory experience — seeing one object, then two, and abstracting the difference. That grounding gives meaning to the symbol, something LLMs don't have.

madaxe_again 4 days ago
The instantiation of models in humans is not unsupervised, and language, for instance, absolutely requires labelled data and structured input. The predefined objective is “expand”.
See also: feral children.
- gloosx 3 days ago
  
  Children are not shown pairs like
  "dog": [object of class Canine]
  They infer meaning from noisy, ambiguous sensory streams. The labels are not explicit, they are discovered through correlation, context, and feedback.
  So although caregivers sometimes point and name things, that is a tiny fraction of linguistic input, and it is inconsistent. Children generalize far beyond that.
  Real linguistic input to a child is incomplete, fragmented, error-filled, and dependens on context. It is full of interruptions, mispronunciations, and slang. The brain extracts structure from that chaos. Calling that "structured input" confuses the output - inherent structure of language - with the raw input, noisy speech and gestures.
  The brain has drives: social bonding, curiosity, pattern-seeking. But it doesn't have a single optimisation target like "expand." Objectives are not hardcoded loss functions, they are emergent and changing.
  You're right that lack of linguistic input prevents full language development, but that is not evidence of supervised learning. It just shows that exposure to any language stream is needed to trigger the innate capacity.
  Both complexity and efficiency of the human learning is just on another level. Transformers are child's play compared to that level. They are not going to gain consciousness, and no AGI will happen in the foreseeable future, it is all just marketing crap, and it's becoming more and more obvious as the dust settles.
gkbrk 4 days ago
LLMs are increasingly trained on images for multi-modal learning, so they too would have seen one object, then two.
- gloosx 4 days ago
  
  They never saw any kind of object, they only saw labeled groups of pixels – basic units of a digital image, representing a single point of color on a screen or in a digital file. Object is a material thing that can be seen and touched. Pixels are not objects.
  
  4 replies →

notepad0x90 4 days ago

We do a lot of autocompletion and LLMs overlap with that for sure. I don't know about the "vast majority" even basic operations like making sure we're breathing or have the right hormones prompted are not guesses but deterministic algorithmic ops. Things like object recognition and speech might qualify as autocompletion. But let's say you need to setup health-monitoring for an application. that's not an autocomplete operation. you must evaluate various options, have opinions on it, weigh priorities,etc.. in other words, we do autocompletion but even then the autocompletion is a basic building block or tool we use in constructing more complex decision logic.

If you train an animal to type the right keys on a keyboard that generates a hello world program, you didn't just teach them how to code. they just memorized the right keys that lead to their reward. a human programmer understands the components of the code, the intent and expectations behind it, and can reason about how changes would affect outcomes. the animal just knows how the reward can be obtained most reliably.