← Back to context

Comment by Rover222

3 days ago

I don't think you grasp what I'm saying? I'm talking about next token prediction to generate video frames.

Yeah, which is pretty slow due to the need to autoregressively generate each image frame token in sequence. And leading diffusion models need to progressively denoise each frame. These are very expensive computationally. Generating the entire world using current techniques is incredibly expensive compared to rendering and rasterizing triangles, which is almost completely parallelized by comparison.

  • Okay you clearly know 20x more than me about this, so I cannot logically argue. But the vague hunch remains that this is the future of video games. Within 3 to 4 years.

    • I don't think that will ever happen die to extreme hardware requirements. What I do see happen is that only an extremely low fidelity scene is rendered with only basic shapes, no or very little textures etc. that is them filled in by AI. DLSS taken to the extreme, not just resolution but the whole stack.