Comment by tsimionescu
2 days ago
Human/animal vision uses way more than parallax to judge distances and bearings - it uses a world model that evolved over millions of years to model the environment. That's why we can get excellent 3D images from a 2D screen, and also why our depth perception can be easily tricked with objects of unexpected size. Put a human or animal in an abstract environment with no shadows and no familiar objects, and you'll see that depth perception based solely on parallax is actually very bad.
> it uses a world model that evolved over millions of years to model the environment. That's why we can get excellent 3D images from a 2D screen
That doesn’t require millions of years of evolution. We can ‘evolve’ it way faster on computers.
For an example, see https://depth-anything.github.io/.
I also think we don’t need good depth estimation to avoid collisions while walking around. The problem is scale-invariant except for the fact that deceleration is superlinear (doubling your speed more than doubles stopping distance), but at walking speed, that effect isn’t very large.
Decent depth estimation is needed for judging foot placement, but that’s at relatively close range.
At driving speed, that changes, but I think you can still get away with rough estimates.
(I’m not saying one shouldn’t use LiDAR, just arguing that we don’t know whether “LiDAR is necessary” is true. Yes, cameras cannot reproduce all aspects of human vision yet, but they also can surpass many aspects of human vision. Examples are resolution and field of view)