← Back to context

Comment by Certhas

3 days ago

This _might_ be true, but it's utterly absurd to claim this is a certainty.

The images rendered in a game need to accurately represent a very complex world state. Do we have any examples of Transformer based models doing something in this category? Can they do it in real-time?

I could absolutely see something like rendering a simplified and stylised version and getting Transformers to fill in details. That's kind of a direct evolution from the upscaling approach described here, but end to end rendering from game state is far less obvious.

Doesn’t this imply that a transformer or NN could fill in details more efficiently than traditional techniques?

I’m really curious why this would be preferable for a AAA studio game outside of potential cost savings. Also imagine it’d come at the cost of deterministic output / consistency in visuals.

  I could absolutely see something like rendering a simplified and stylised version and getting Transformers to fill in details. That's kind of a direct evolution from the upscaling approach described here, but end to end rendering from game state is far less obvious.

Sure. This could be a variation. You do a quick render that any GPU from 2025 can do and then make the frame hyper realistic through a transformer model. It's basically saying the same thing.

The main rendering would be done by the transformer.

Already in 2025, Google Veo 3 is generating pixels far more realistic than AAA games. I don't see why this wouldn't be the default rendering mode for AAA games in 2035. It's insanity to think it won't be.

Veo3: https://aistudio.google.com/models/veo-3

  • > Google Veo 3 is generating pixels far more realistic than AAA games

    That’s because games are "realtime", meaning with a tight frame-time budget. AI models are not (and are even running on multiple cards each costing 6 figures).

  • Well you missed the point. You could call it prompt adherence. I need veo to generate the next frame in a few milliseconds, and correctly represent the position of all the cars in the scene (reacting to player input) reliably to very high accuracy.

    You conflate the challenge of generating realistic pixels with the challenge of generating realistic pixels that represent a highly detailed world state.

    So I don't think your argument is convincing or complete.

  • > Already in 2025, Google Veo 3 is generating pixels far more realistic than AAA games.

    Traditional rendering techniques can also easily exceed the quality of AAA games if you don't impose strict time or latency constraints on them. Wake me up when a version of Veo is generating HD frames in less than 16 milliseconds, on consumer hardware, without batching, and then we can talk about whether that inevitably much smaller model is good enough to be a competitive game renderer.

Genie 3 is already a frontier approach to interactive generative world views no?

It will be AI all the way down soon. The models internal world view could be multiple passes and multi layer with different strategies... In any case; safe to say more AI will be involved in more places ;)

  • I am super intrigued by such world models. But at the same time it's important to understand where they are at. They are celebrating the achievement of keeping the world mostly consistent for 60 seconds, and this is 720p at 24fps.

    I think it's reasonable to assume we won't see this tech replace game engines without significant further breakthroughs...

    For LLMs agentic workflows ended up being a big breakthrough to make them usable. Maybe these World Models will interact with a sort of game engine directly somehow to get the required consistency. But it's not evident that you can just scale your way from "visual memory extending up to one minute ago" to 70+ hour game experiences.