← Back to context

Comment by MattCruikshank

14 hours ago

I know feelings about AI are mixed. But when AI can dream up gaussian splats in real time, from a prompt, and do refinement as you get closer to things... That's going to be pretty bonkers.

That's kinda what NERFs are (neural radience fields). They actually preceeded this Gaussian story, with Gaussians coming in and outperforming them. Maybe they'll merge later for something even better, I don't know enough about them.

  • Sure, but NERFs were trying to match your input photos and poses, not some arbitrary prompt, if I understand correctly.

    • Yes they are image generators. You want image generator generators.

      A diffusion style process generating gausians instead of pixels. You could possibly do nerfs that way, but it would be effectively generating a trained network. If you managed to do that it would have broad application throughout the field of AI.

      3 replies →

I could see a kind of fun game / design tool / worldbuilding where you get a blurry world and you describe what you are seeing, and it comes into focus. The game world, mechanics, aesthetic, and playstyle build as you form your view. A sort of fog of war meets rorschach game.

This will be the future of a class of 3d Game. the prompt may not be text however.

An input of a kind of schematic representation of what the designer wants would be better. It may resemble a storyboard or a collection of organised notes that large projects tend to already use.

Fully generative could probably do some cool things, but people will still want to bring their peronal vision to life.

  • Curious, why wouldn't the future be a full world model like Google's Genie? It just renders every pixel so someone could still make their vision come to life via a prompt too.

    • It could be done that way but you are spending parameters managing the fact that the output changes completely with a change in view position or orientation. A observer independent model only has to manage changes of things that are actually changing in the world.

      Since you can view Gaussian splats from any POV you end up generating an output that is closer to the representation of the world instead of a projection that a single observer sees.

  • Yeah, when you describe that, I picture Wave Function Collapse to generate a map schematic... And then a text prompt, and some style photos the designers want it to match.