Comment by D-Machine
18 days ago
> It has little to do with the reality of that language or the correctness of your model of it, but rather with the need to train realtime circuits to do some work.
To the contrary, this is purely speculative and almost certainly wrong, riding a bike is co-ordinating the realtime circuits in the right way, and language and a linguistic model fundamentally cannot get you there.
There are plenty of other domains like this, where semantic reasoning (e.g. unquantified syllogistic reasoning) just doesn't get you anywhere useful. I gave an example from cooking later in this thread.
You are falling IMO into exactly the trap of the linguistic reductionist, thinking that language is the be-all and end-all of cognition. Talk to e.g. actual mathematicians, and they will generally tell you they may broadly recruit visualization, imagined tactile and proprioceptive senses, and hard-to-vocalize "intuition". One has to claim this is all epiphenomenal, or that e.g. all unconscious thought is secretly using language, to think that all modeling is fundamentally linguistic (or more broadly, token manipulation). This is not a particularly credible or plausible claim given the ubiquity of cognition across animals or from direct human experiences, so the linguistic boundedness of LLMs is very important and relevant.
Funny, because riding a bicycle or speaking a language is exactly something people don't have a world model of. Ask someone to explain how riding a bicycle works, or an uneducated native speaker to explain the grammar of their language. They have no clue. "Making the right movement at the right time within a narrow boundary of conditions" is a world model, or is it just predicting the next move?
> You are falling IMO into exactly the trap of the linguistic reductionist, thinking that language is the be-all and end-all of cognition.
I'm not saying that at all. I am saying that any (sufficiently long, varied) coherent speech needs a world model, so if something produces coherent speech, there must be a world model behind. We can agree that the model is lacking as much as the language productions are incoherent: which is very little, these days.
> Funny, because riding a bicycle or speaking a language is exactly something people don't have a world model of. Ask someone to explain how riding a bicycle works, or an uneducated native speaker to explain the grammar of their language. They have no clue
This is circular, because you are assuming their world-model of biking can be expressed in language. It can't!
EDIT: There are plenty of skilled experts, artists and etc. that clearly and obviously have complex world models that let them produce best-in-the-world outputs, but who can't express very precisely how they do this. I would never claim such people have no world model or understanding of what they do. Perhaps we have a semantic / definitional issue here?
> This is circular, because you are assuming their world-model of biking can be expressed in language. It can't!
Ok. So I think I get it. For me, producing coherent discourse about things requires a world model, because you can't just make up coherent relationships between objects and actions long enough if you don't understand what their properties are and how they relate to each other.
You, on the other hand, claim that there are infinite firsthand sensory experiences (maybe we can call them qualia?) that fall in between the cracks of language and are rarely communicated (though we use for that a wealth of metaphors and synesthesia) and can only be understood by those who have experienced them firsthand.
I can agree with that if that's what you mean, but at the same time I'm not sure they constitute such a big part of our thought and communication. For example, we are discussing about reality in this thread and yet there are no necessary references to first hand experiences. Any time we talk about history, physics, space, maths, philosophy, we're basically juggling concepts in our heads with zero direct experience of them.
1 reply →
> because you are assuming their world-model of biking can be expressed in language. It can't!
So you can't build an AI model that simulates riding a bike? I'm not stating a LLM model, I'm just saying the kind of AI simulation we've been building virtual worlds with for decades.
So, now that you agree that we can build AI models of simulations, what are those AI models doing. Are they using a binary language that can be summarized?
3 replies →
> Ask someone to explain how riding a bicycle works, or an uneducated native speaker to explain the grammar of their language. They have no clue.
This works against your argument. Someone who can ride a bike clearly knows how to ride a bike, that they cannot express it in tokenized form speaks to the limited scope ofof written word in representing embodiment.
Yes and no. Riding a bicycle is a skill: your brain is trained to do the right thing and there's some basic feedback loop that keeps you in balance. You could call that a world model if you want, but it's entirely self contained, limited to a very few basic sensory signals (acceleration and balance), and it's outside your conscious knowledge. Plenty of people lack this particular "world model" and can talk about cyclists and bicycles and traffic, and whatnot.
1 reply →
Um, yea, I call complete bullshit on this. I don't think anyone here on HN is watching what is happening in robotics right now.
Nvidia is out building LLM driven models that work with robot models that simulate robot actions. World simulation was a huge part of AI before LLMs became a thing. With a tight coupling between LLMs and robot models we've seen an explosion in robot capabilities in the last few years.
You know what robots communicate with their actuators and sensors with. Oh yes, binary data. We quite commonly call that words. When you have a set of actions that simulate riding a bicycle in virtual space that can be summarized and described. Who knows if humans can actually read/understand what the model spits out, but that doesn't mean it's invalid.
2 replies →
I don't know how you got this so wrong. In control theory you have to build a dynamical system of your plant (machine, factory, etc). If you have a humanoid robot, you not only need to model the robot itself, which is the easy part actually, you have to model everything the robot is interacting with.
Once you understand that, you realize that the human brain has an internal model of almost everything it is interacting with and replicating human level performance requires the entire human brain, not just isolated parts of it. The reason for this is that since we take our brains for granted, we use even the complicated and hard to replicate parts of the brain for tasks that appear seemingly trivial.
When I take out the trash, organic waste needs to be thrown into the trash bin without the plastic bag. I need to untie the trash bag, pinch it from the other side and then shake it until the bag is empty. You might say big deal, but when you have tea bags or potato peels inside, they get caught on the bag handles and get stuck. You now need to shake the bag in very particular ways to dislodge the waste. Doing this with a humanoid robot is basically impossible, because you would need to model every scrap of waste inside the plastic bag. The much smarter way is to make the situation robot friendly by having the robot carry the organic waste inside a portable plastic bin without handles.