← Back to context

Comment by gloosx

4 days ago

> Any thinking that happens with words is fundamentally no different from what LLMs do.

This is such a wildly simplified and naive claim. "Thinking with words" happens inside a brain, not inside a silicon circuit with artificial neurons bolted in place. The brain is plastic, it is never the same from one moment to the next. It does not require structured input, labeled data, or predefined objectives in order to learn "thinking with words." The brain performs continuous, unsupervised learning from chaotic sensory input to do what it does. Its complexity and efficiency are orders of magnitude beyond that of LLM inference. Current models barely scratch the surface of that level of complexity and efficiency.

> Do you have a concept of one-ness, or two-ness, beyond symbolic assignment?

Obviously we do. The human brain's idea of "one-ness" or "two-ness" is grounded in sensory experience — seeing one object, then two, and abstracting the difference. That grounding gives meaning to the symbol, something LLMs don't have.

The instantiation of models in humans is not unsupervised, and language, for instance, absolutely requires labelled data and structured input. The predefined objective is “expand”.

See also: feral children.

  • Children are not shown pairs like

    "dog": [object of class Canine]

    They infer meaning from noisy, ambiguous sensory streams. The labels are not explicit, they are discovered through correlation, context, and feedback.

    So although caregivers sometimes point and name things, that is a tiny fraction of linguistic input, and it is inconsistent. Children generalize far beyond that.

    Real linguistic input to a child is incomplete, fragmented, error-filled, and dependens on context. It is full of interruptions, mispronunciations, and slang. The brain extracts structure from that chaos. Calling that "structured input" confuses the output - inherent structure of language - with the raw input, noisy speech and gestures.

    The brain has drives: social bonding, curiosity, pattern-seeking. But it doesn't have a single optimisation target like "expand." Objectives are not hardcoded loss functions, they are emergent and changing.

    You're right that lack of linguistic input prevents full language development, but that is not evidence of supervised learning. It just shows that exposure to any language stream is needed to trigger the innate capacity.

    Both complexity and efficiency of the human learning is just on another level. Transformers are child's play compared to that level. They are not going to gain consciousness, and no AGI will happen in the foreseeable future, it is all just marketing crap, and it's becoming more and more obvious as the dust settles.

LLMs are increasingly trained on images for multi-modal learning, so they too would have seen one object, then two.

  • They never saw any kind of object, they only saw labeled groups of pixels – basic units of a digital image, representing a single point of color on a screen or in a digital file. Object is a material thing that can be seen and touched. Pixels are not objects.

    • My friend, you are blundering into metaphysics here - ceci n’est pas une pipe, the map is the territory, and all that.

      We are no more in touch with physical reality than an LLM, unless you are in the habit of pressing your brain against things. Everything is interpreted through a symbolic map.

      1 reply →

    • Okay, goalpost has instantly moved from seeing to "seeing and touching". Once you feed in touch sensor data, where are you going to move the goalpost next?

      Models see when photons hit camera sensors, you see when photons hit your retina. Both of them are some kind of sight.

      1 reply →