← Back to context

Comment by levocardia

2 days ago

Surprised that there isn't any explicit discussion of why dexterity is so hard, beyond sensory perception. One of the root causes (IMHO the biggest one) is that modeling contact, ie the static and dynamic friction between two objects, is extremely complicated. There are various modeling strategies but their results are highly sensitive to various tuning parameters which makes it very hard to learn in simulation. From what I remember, the OpenAI Rubik's Cube solver basically learned across a giant set of worlds of many different possible tuning parameters for the contact models and was able to generalize okay to the real world, in various situations.

It seems most likely that this sort of boring domain randomization will be what works, or works well enough, for solving contact in this generation of robotics, but it would be much more exciting if someone figures out a better way to learn contact models (or a latent representation of them) in real time.

This rings super plausible to me. I dabbled a bit in hobby electronics making DIY walkers, and the more time you spend on junior stuff like that (trying to model a good response to servo load feedback that works in every situation, etc.) the more it dawns on you that what humans and other animals do with the sensor feedback they get from their limbs is so rich in "magic" and intelligence.

Figuring out physical interaction with the environment and traversal is truly one of the most stunning early achievements of life.

On a freestanding humanoid robot, you have an inverse kinematic chain running all the way from the touch point to the ground, with many actuators in between, each of which to some degree squares the complexity of the problem. The parent article mentions a Fanuc or Kuka bot, which lets say is 6 axis - they are incredibly stiff/strong, in many cases many orders of magnitude stronger than they really need to be for the job they are tasked with, they do not move, modeling things like clashing with the environment/itself is much simpler because they are placed in 100% controlled environments - remove all of those qualifiers (weak robot because it needs to be light, dynamic environment, and count the DOF between the robots finger and it's ankles) and it gives a clearer picture than the article offers of why all this stuff is difficult. Can't take much of a divide and conquer approach like you can in other domains.

  • Inverse kinematics are piss easy.

    When grasping an object you need to know the normal force on the contact point of the object and check that you're still in the friction cone.

    This is hard, because you need to know the friction coefficient of the object and finger tip combination, you need to know the exact coordinates on the object you're putting that finger plus the orientation and graspable surfaces of the object and you have an imperfect model of the robot dynamics that doesn't account for friction or the dynamics of the manipulated objects.

    Basically nothing is easy. You don't know anything about what you're manipulating.

    • >When grasping an object you need to know the normal force on the contact point of the object and check that you're still in the friction cone.

      You don't. Certainly I do not need such information.

      But that is what makes robotics hard, there is no easy answer as to how a human knows how to properly grasp an object.

But humans can't really model friction either. Humans use their perception of the interaction to optimize for the desired behavior, for that it is enough to know that more pressure means higher friction. Even subconsciously there is no model of this, it is fine tuned on the fly and in response to external circumstances.

I think the answer to your question is that robots actually need to have complex friction models. If your planning has a need to actually know beforehand how to materials interact, you already lost.

Dexterity is also hard because, at least in humans, it relies on knowing something of the nature of an object _before_ manipulating it. Is it light or heavy? Soft or rigid? Is it a bag of popcorn, popcorn kernels, a bag of powder, or a pillow? How tightly is it packed in the bag? Fabric or cardboard? Attached to other objects or not? Is the USB plug the right type and oriented correctly? (Even humans have trouble with this one.) Does it have a slippery surface or a grippy surface? To be immediately successful in manipulation, pre-knowledge based on sensing and identification is usually required. Possibly it would be ok if a robot took several tries to figure this out based on some general principles, but it will seem clumsy and be slower. It seems there is an ontology problem here, which requires understanding a lot about the world in order to be able to successfully manipulate it.

More generally, continuous learning in real-time is something current models don't do well. Retraining an entire LLM every time something new is encountered is not scalable. Temporary learning does not easily transfer to long term knowledge. Continuous learning still seems in its infancy.

  • Also when we don't know the properties of an object we are about to manipulate we'll approach it cautiously and learn it before we apply too much force. This tends to happen transparently and quickly for adults, but for infants you can watch it play out more slowly.

    • My guess is that it helps a lot that we have flexible cushioned fingertips that are highly sensitive to pressure. That's a hardware feature that robots mostly lack.

      3 replies →

And perhaps it's because people expect robotics powered by ML to be perfect, never fall over, never crush the egg with a gripper.

Yet we do that stuff all the time, so not really a reasonable expectation given ML is based on biology. Still seems many general models do certain things better than we can though.

  • Based on is a bit strong. Parts are at least notionally inspired by biology.