Comment by ra7

2 days ago

The novel aspect here seems to be 3D LiDAR output from 2D video using post-training. As far as I'm aware, no other video world models can do this.

IMO, access to DeepMind and Google infra is a hugely understated advantage Waymo has that no other competitor can replicate.

This is the real story buried under the simulation angle. If you can generate reliable 3D LiDAR from 2D video, every dashcam on earth becomes training data. Every YouTube driving video, every GoPro clip, every security camera feed.

Waymo's fleet is ~700 cars. The internet has millions of hours of driving footage. This technique turns the entire internet into a sensor suite. That's a bigger deal than the simulation itself.

3d from moving 2d images has been a thing for decades.

  • This is 3D LiDAR output (multimodal) from 2D images.

    • LiDAR is the technology used to do spatial capture. The output is just point clouds of surfaces. So they’re generating surface point clouds from video