Comment by ollin
17 hours ago
Really great to see this released! Some interesting videos from early-access users:
- https://youtu.be/15KtGNgpVnE?si=rgQ0PSRniRGcvN31&t=197 walking through various cities
- https://x.com/fofrAI/status/2016936855607136506 helicopter / flight sim
- https://x.com/venturetwins/status/2016919922727850333 space station, https://x.com/venturetwins/status/2016920340602278368 Dunkin' Donuts
- https://youtu.be/lALGud1Ynhc?si=10ERYyMFHiwL8rQ7&t=207 simulating a laptop computer, moving the mouse
- https://x.com/emollick/status/2016919989865840906 otter airline pilot with a duck on its head walking through a Rothko inspired airport
These are extremely impressive from a technological progression standpoint, and at the same time not at all compelling, in the same way AI images and LLM prose are and are not.
It's neat I guess that I can use a few words and generate the equivalent of an Unreal 5 asset flip and play around in it. Also I will never do that, much less pay some ongoing compute cost for each second I'm doing it.
Exactly. People are getting so excited that all this stuff is possible, and forgetting that we are burning through innumerable finite resources just to prove something is possible.
They were too concerned with whether or not they could, they never stopped to think if they should.
Yeah, the future I see from this is just shitty walking video games that maybe look nice but have ridiculous input lag, stuttery frame rates, and no compelling gameplay loop or story. Oh and another tool to fill up facebook with more fake videos to make people angry. Oh well, I guess this is what we've decided to direct all our energy towards.
I was lucky enough to be an early tester, here's a brief video walking through the process of creating worlds, showing examples--walking on the moon, with Nasa photo as part of the prompt, being in 221B Baker street with Holmes and Watson, wandering through a night market in Taipei as a giant boba milk tea (note how the stalls are different, and sell different foods), and also exploring the setting of my award-nominated tabletop RPG.
https://www.youtube.com/watch?v=FyTHcmWPuJE
It's an experimental research prototype, but it also feels like a hint of the future. Feel free to ask any questions.
I liked that first one and I hope someone creates one of going back to dinosaur age, i want to see that.
One step closer to the science-based dinosaur MMO we were promised.
Tim is awesome.
Ironically, he covered PixVerse's world model last week and it came close to your ask: https://youtu.be/SAjKSRRJstQ?si=dqybCnaPvMmhpOnV&t=371
(Earlier in the video it shows him live prompting.)
World models are popping up everywhere, from almost every frontier lab.
Any thoughts about Project Genie?
On a technical level, this looks like the same diffusion transformer world model design that was shown in the Genie 3 post (text/memory/d-pad input, video output, 60sec max context, 720p, sub-10FPS control latency due to 4-frame temporal compression). I expect the public release uses a cheaper step-distilled / quantized version. The limitations seen in Genie 3 (high control latency, gradual loss of detail and drift towards videogamey behavior, 60s max rollout length) are still present. The editing/sharing tools, latency, cost, etc. can probably improve over time with this same model checkpoint, but new features like audio input/output, higher resolution, precise controls, etc. likely won't happen until the next major version.
From a product perspective, I still don't have a good sense of what the market for WMs will look like. There's a tension between serious commercial applications (robotics, VFX, gamedev, etc. where you want way, way higher fidelity and very precise controllability), vs current short-form-demos-for-consumer-entertainment application (where you want the inference to be cheap-enough-to-be-ad-supported and simple/intuitive to use). Framing Genie as a "prototype" inside their most expensive AI plan makes a lot of sense while GDM figures out how to target the product commercially.
On a personal level, since I'm also working on world models (albeit very small local ones https://news.ycombinator.com/item?id=43798757), my main thought is "oh boy, lots of work to do". If everyone starts expecting Genie 3 quality, local WMs need to become a lot better :)