Comment by espadrine
9 years ago
Our eyes are not LIDAR, though, and we can drive pretty well.
In fact, the reason we have crashes is NOT our eyes’ lack of distance detection through laser return timing — having two eyes is enough for distance appreciation. We have crashes because of attention deficit instead.
At this point, there is no reason to believe that a machine can't achieve and outperform a human on a driving task given the same inputs. Sure, human eyes have 5 million cone cells and 1080p feeds only have 2 million pixels, but 4K has 9 million, and more importantly, that level of precision is unnecessary for regular driving.
And Tesla doesn’t even bet just on the visible spectrum; it also relies on radar.
The trick is in our wetware. What the brain does with visual input is not just trivial object recognition. It relies on a complex internal model of the world to both augment the object recognition and to sometimes discard the visual data as invalid.
So sure, theoretically cameras would be enough. But we're not yet there with software, we can't use the camera input well enough. So if you can side-step the need for not-yet-invented ML methods by simply adding a LIDAR to a sensor suite, then it's an obvious way to go.
Compare with powered flight: we didn't get very far by trying to copy the way birds do it. The trick is in the super-light materials birds are made of, and the energy efficiency of their organisms. We only succeeded at powered flight when we brute-forced it by strapping a gasoline engine onto a bunch of wooden planks.
> It relies on a complex internal model of the world to both augment the object recognition and to sometimes discard the visual data as invalid.
That in particular is what makes the hiring fascinating. This problem is Andrej Karpathy’s expertise[0]. His CNN/RNN designs have reached comfortable results, in particular showcasing the ability to identify elements of a source image, and the relationship between different parts of the image.
The speed at which those techniques improve is also stunning. I didn’t expect CNNs to solve Go and image captioning so fast, but here we are!
I think the principles are already there; a few tweaks and a careful design is all it takes to beat the average driver.
[0]: http://cs.stanford.edu/people/karpathy/main.pdf
I think LIDAR will have a place, if they can get the per-sensor cost down to something reasonable (under $500.00 per sensor, for 3D 180 degree with at least 32 vertical res beams - or the equivalent).
But I think first we'll see cars utilizing tech as described in this paper:
https://arxiv.org/abs/1604.07316
...and variations of it to handle other modeling and vision tasks.
Self-driving vehicle systems are amazing complex; it won't ultimately be any single system or sensor, or piece of software or algorithm that solves the problem - it's going to be a complex mesh of all of them working in concert.
And even then, there will be mistakes, injuries, and deaths unfortunately.
Image captioning is not solved (yet), even if there was a lot of progress made in recent years.