Comment by ActorNightly

15 hours ago

The purpose of lidar is to prove error correction when you need it most in terms of camera accuracy loss.

Humans do this, just in the sense of depth perception with both eyes.

Human depth perception uses stereo out to only about 2 or 3 meters, after which the distance between your eyes is not a useful baseline. Beyond 3m we use context clues and depth from motion when available.

  • Thanks, saved some work.

    And I'll add that it in practice it is not even that much unless you're doing some serious training, like a professional athlete. For most tasks, the accurate depth perception from this fades around the length of the arms.

(Always worth noting, human depth perception is not just based on stereoscopic vision, but also with focal distance, which is why so many people get simulator sickness from stereoscopic 3d VR)

  • > Always worth noting, human depth perception is not just based on stereoscopic vision, but also with focal distance

    Also subtle head and eye movements, which is something a lot of people like to ignore when discussing camera-based autonomy. Your eyes are always moving around which changes the perspective and gives a much better view of depth as we observe parallax effects. If you need a better view in a given direction you can turn or move your head. Fixed cameras mounted to a car's windshield can't do either of those things, so you need many more of them at higher resolutions to even come close to the amount of data the human eye can gather.

    • Easiest example I always give of this is pulling out of the alley behind my house: there is a large bush that occludes my view left to oncoming traffic, badly. I do what every human does:

      1. Crane my neck forward, see if I can see around it.

      2. Inch forward a bit more, keep craning my neck.

      3. Recognize, no, I'm still occluded.

      4. Count on the heuristic analysis of the light filtering through the bush and determine if the change in light is likely movement associated with an oncoming car.

      My Tesla's perpendicular camera is... mounted behind my head on the B-pillar... fixed... and sure as hell can't read the tea leaves, so to speak, to determine if that slight shadow change increases the likelihood that a car is about to hit us.

      I honestly don't trust it to pull out of the alley. I don't know how I can. I'd basically have to be nose-into-right-lane for it to be far enough ahead to see conclusively.

      Waymo can beam the LIDAR above and around the bush, owing to its height and the distance it can receive from, and its camera coverage to the perpendicular is far better. Vision only misses so many weird edge cases, and I hate that Elon just keeps saying "well, humans have only TWO cameras and THEY drive fine every day! h'yuck!"

  • In fact there are even more depth perception clues. Maybe the most obvious is size (retinal versus assumed real world size). Further examples include motion parallax, linear perspective, occlusion, shadows, and light gradients

    Here is a study on how these effects rank when it’s comes to (hand) reaching tasks in VR: https://pubmed.ncbi.nlm.nih.gov/29293512/

  • Actually the reason people experience vection in VR is not focal depth but the dissonance between what their eyes are telling them and what their inner ear and tactile senses are telling them.

    It's possible they get headaches from the focal length issues but that's different.

  • I keep wondering about the focal depth problem. It feels potentially solvable, but I have no idea how. I keep wondering if it could be as simple as a Magic Eye Autostereogram sort of thing, but I don't think that's it.

    There have been a few attempts at solving this, but I assume that for some optical reason actual lenses need to be adjusted and it can't just be a change in the image? Meta had "Varifocal HMDs" being shown off for a bit, which I think literally moved the screen back and forth. There were a couple of "Multifocal" attempts with multiple stacked displays, but that seemed crazy. Computer Generated Holography sounded very promising, but I don't know if a good one has ever been built. A startup called Creal claimed to be able to use "digital light fields", which basically project stuff right onto the retina, which sounds kinda hogwashy to me but maybe it works?

  • My understanding is that contextual clues are a big part of it too. We see a the pitcher wind up and throw a baseball as us more than we stereoscopically track its progress from the mound to the plate.

    More subtly, a lot of depth information comes from how big we expect things to be, since everyday life is full of things we intuitively know the sizes of, frames of reference in the form of people, vehicles, furniture, etc . This is why the forced perspective of theme park castles is so effective— our brains want to see those upper windows as full sized, so we see the thing as 2-3x bigger than it actually is. And in the other direction, a lot of buildings in Las Vegas are further away than they look because hotels like the Bellagio have large black boxes on them that group a 2x2 block of the actual room windows.

How expensive is their lidar system?

  • Hesai has driven the cost into the $200 to 400 range now. That said I don't know what they cost for the ones needed for driving. Either way we've gone from thousands or tens of thousands into the hundreds dollar range now.

  • Waymo does their LiDAR in-house, so unfortunately we don’t know the specs or the cost

    • Otto and Uber and the CEO of https://pronto.ai do though (tongue-in-cheek)

      > Then, in December 2016, Waymo received evidence suggesting that Otto and Uber were actually using Waymo’s trade secrets and patented LiDAR designs. On December 13, Waymo received an email from one of its LiDAR-component vendors. The email, which a Waymo employee was copied on, was titled OTTO FILES and its recipients included an email alias indicating that the thread was a discussion among members of the vendor’s “Uber” team. Attached to the email was a machine drawing of what purported to be an Otto circuit board (the “Replicated Board”) that bore a striking resemblance to – and shared several unique characteristics with – Waymo’s highly confidential current-generation LiDAR circuit board, the design of which had been downloaded by Mr. Levandowski before his resignation.

      The presiding judge, Alsup, said, "this is the biggest trade secret crime I have ever seen. This was not small. This was massive in scale."

      (Pronto connection: Levandowski got pardoned by Trump and is CEO of Pronto autonomous vehicles.)

      https://arstechnica.com/tech-policy/2017/02/waymo-googles-se...

  • Cheaper every year.

    • Exactly.

      Tesla told us their strategy was vertical integration and scale to drive down all input costs in manufacturing these vehicles...

      ...oh, except lidar, that's going to be expensive forever, for some reason?

> Humans do this, just in the sense of depth perception with both eyes.

Humans do this with vibes and instincts, not just depth perception. When I can't see the lines on the road because there's too much slow, I can still interpret where they would be based on my familiarity with the roads and my implicit knowledge of how roads work, e.g. We do similar things for heavy rain or fog, although, sometimes those situations truly necessitate pulling over or slowing down and turning on your 4s - lidar might genuinely given an advantage there.

  • That’s the purpose of the neural networks

    • Yes and no - vibes and instincts isn't just thought, it's real senses. Humans have a lot of senses; dozens of them. Including balance, pain, sense of passage of time, and body orientation. Not all of these senses are represented in autonomous vehicles, and it's not really clear how the brain mashes together all these senses to make decisions.