Comment by ra7

2 days ago

The novel aspect here seems to be 3D LiDAR output from 2D video using post-training. As far as I'm aware, no other video world models can do this.

IMO, access to DeepMind and Google infra is a hugely understated advantage Waymo has that no other competitor can replicate.

5 comments

ra7

anupamchugh 2 days ago

This is the real story buried under the simulation angle. If you can generate reliable 3D LiDAR from 2D video, every dashcam on earth becomes training data. Every YouTube driving video, every GoPro clip, every security camera feed.

Waymo's fleet is ~700 cars. The internet has millions of hours of driving footage. This technique turns the entire internet into a sensor suite. That's a bigger deal than the simulation itself.

codexb 2 days ago

3d from moving 2d images has been a thing for decades.

ra7 2 days ago
This is 3D LiDAR output (multimodal) from 2D images.
- promiseofbeans 2 days ago
  
  LiDAR is the technology used to do spatial capture. The output is just point clouds of surfaces. So they’re generating surface point clouds from video

moffkalast 2 days ago

It's not unheard of, there are a handful [0] of metric monodepth methods that output data that's not unlike a really inaccurate 3D lidar, though theirs certainly looks SOTA.

[0] https://github.com/YvanYin/Metric3D