Comment by afavour
2 days ago
If all the effort has been spent in trying to replicate the human brain then I am comfortable saying that is a mistake.
We have a tool that can tell with great accuracy how far away an object is. The suggestion that we should ignore it and rely on cameras that have to guess it because “that’s how humans work” is absurd, frankly.
Before you can learn how far away an object is, you must decide: which laser return corresponds to which object? In fact, what counts as an object? Where does a tree stop and become a fallen tree branch? Is that object moving towards me? Is the apparent velocity of this point represent the fact that the object is moving, or that it's rotating, or that it's flexing, or dividing, or all 4? Is that object moving towards me but that's ok because it's a car that's going to stay in its lane? What's a lane? What's my laser return for where the lane is? Should I stop at this intersection? What's my laser return for whether the light is red? Am I in the blind spot of the car in front of me? Is he about to shift into my lane because he doesn't see me? What laser return do I get to tell me whether his indicator is on?
The problem of understanding what is happening in front of you while driving is preposterously more complicated than just a point cloud of distances. That is .01% of the problem. To solve the remaining 99.99%, you need interpretation of photons and sound waves into a semantic understanding that gives you predictive power to guess how the physical world will evolve and avoid breaking the rules of the road. Show me a mechanized way of understanding the causes of how the physical structure of the world is about to evolve, and I'll show you something that is imitating a human brain, however poorly. The cameras give you _plenty_ of data to determine 3D structure, at a higher resolution than the laser, without being emissive, for cheaper. It's a completely reasonable approach to focus your limited computational hardware on interpreting the data you have instead of adding more modalities with their own limitations that (according to nature) are demonstrably unnecessary.
The world is more complicated than slogans and pitchforks and Elon Bad.
People get into accidents not because they don't know with great accuracy how far away an object is.
They get into accidents because they make bad decisions and get distracted.
If AI makes better decisions and don't get distracted, the amount of accidents will already be greatly reduced compared to humans.
Having lidar in addition to cameras will be of marginal benefit (but a benefit to be sure) when you realize what is actually important: proper modeling of the environment. And for this, cameras are better at providing than lidar, so you still will want cameras anyways.
The focus on lidar is really a red herring. You merely push the computational budget you have to understanding a point cloud instead of vision. You're back to square 1 of "how can I properly model the environment given this sensory modality". This is the part that essentially needs human level understanding of the world that you're missing.
As the other commenter says, you deeply misunderstand the problem.