Comment by WarmWash
21 hours ago
The actual breakthrough with Genie is being able to turn around and look back, and seeing the same scene that was there before. A few other labs have similar world simulators, but they all struggle badly with keeping coherence of things not in view. Hence why they always walk forwards and never look around.
They achieve that by not generating the scene you see, but a lens warped version of 360 degree view. So you turning the other way doesn't delete what's happening / generated on your back side. However I expect it to breakdown if you put a blocker in between and remove it. i.e. go behind a wall and come back, or enter and exit a building. Would be nice to play with.
Still amazed it took ML people so long to realize they needed and explicit representation to cache stuff.
Genie does not use an explicit representation:
>Genie 3’s consistency is an emergent capability. Other methods such as NeRFs and Gaussian Splatting also allow consistent navigable 3D environments, but depend on the provision of an explicit 3D representation. By contrast, worlds generated by Genie 3 are far more dynamic and rich because they’re created frame by frame based on the world description and actions by the user.
The representation is learned. Also, see Sutter's "Bitter Lesson" essay
What about Fei Fei Li's lab? I think they are generating true 3D worlds rather than frames of a video?
Although that probably precludes her from having animations in those worlds...
And what if I go somewhere then go back there a week later?
Best they can do is 60 seconds, for now at least.
Makes you wonder what the TTL caching for our universe is.
3 replies →
Can they? Is there a video of someone standing in place and spinning the camera 1080 degrees?