Comment by ai-x
3 days ago
I thought about this and I think it boils to how the model is trained.
Tesla trains it models from actual drivers purely based on (input) Vision and (output) actuators - Brake, Steering, Accelerators.
Human output is based on what they and the camera sees. So, it's a 1:1 match.
If Waymo were to do that, it'll muddle the training set. The Lidar input may override camera input.
I always struggled when Musk mentioned Lidar will make it ambiguous. It didn't make any sense to me why having a secondary failback sensor messes things. But, if you put it in the training data context, it absolutely makes sense.
This is an interesting viewpoint, but isn't it also solveable?
Just because the human in the scenario only took vision as input, why does that matter to the training data and the model? The actions are the same.
To put it another way, what about all the cultural context the human had, or the sounds, smells, past experiences at the same intersection, etc? Even Tesla can't record this, but I'm not sure that matters.
E.g If the driver brakes because they saw a pothole, and Lidar captures someone biking 200m away on their own path, it may mistakenly put more weight on brake causation to the 200m away object (because large moving object) vs the pothole.
I'm exaggerating, but I hope you get the point. It isn't even conflicting sensor signals about the pothole, but conflicting information about the causation. With vision only there is no conflict for the training data. This was my Aha moment. Multiple Sensors are absolutely important for fallback and extra safety, but screws up training that are based on Human Drivers
I think Elon himself doesn't understand this and hence can't articulate it, while just repeating whatever his ML engineer has said.
The biggest issue with using both camera and lidar is how to properly resolve conflicting returns from different sensor types.
That is vastly preferable to slamming into the back of an emergency vehicle because the cameras are dazzled by the strobes, or slamming into tractor trailers because the cameras were blinded by sunlight. Or slamming on the brakes because the car thinks a shadow in the road is a physical object...