Comment by MattCruikshank

14 hours ago

I know feelings about AI are mixed. But when AI can dream up gaussian splats in real time, from a prompt, and do refinement as you get closer to things... That's going to be pretty bonkers.

16 comments

MattCruikshank

perching_aix 14 hours ago

That's kinda what NERFs are (neural radience fields). They actually preceeded this Gaussian story, with Gaussians coming in and outperforming them. Maybe they'll merge later for something even better, I don't know enough about them.

MattCruikshank 14 hours ago
Sure, but NERFs were trying to match your input photos and poses, not some arbitrary prompt, if I understand correctly.
- Lerc 14 hours ago
  
  Yes they are image generators. You want image generator generators.
  A diffusion style process generating gausians instead of pixels. You could possibly do nerfs that way, but it would be effectively generating a trained network. If you managed to do that it would have broad application throughout the field of AI.
  
  3 replies →
cubefox 13 hours ago
NERFs have significantly higher image quality than 3D Gaussian Splatting or more recent similar techniques, though they are much slower to render.
- thrownthatway 13 hours ago
  
  This one month old video did a reasonable job of getting my entirely ignorant self relatively up to day on NERFs and Gaussian Splats:
  https://youtu.be/X8yRlA7jqEQ

basch 11 hours ago

I could see a kind of fun game / design tool / worldbuilding where you get a blurry world and you describe what you are seeing, and it comes into focus. The game world, mechanics, aesthetic, and playstyle build as you form your view. A sort of fog of war meets rorschach game.

corysama 9 hours ago

We are currently at real-time video generation that can be converted to splats or meshes.

https://research.nvidia.com/labs/sil/projects/lyra2/

Lerc 14 hours ago

This will be the future of a class of 3d Game. the prompt may not be text however.

An input of a kind of schematic representation of what the designer wants would be better. It may resemble a storyboard or a collection of organised notes that large projects tend to already use.

Fully generative could probably do some cool things, but people will still want to bring their peronal vision to life.

satvikpendem 13 hours ago
Curious, why wouldn't the future be a full world model like Google's Genie? It just renders every pixel so someone could still make their vision come to life via a prompt too.
- Lerc 7 hours ago
  
  It could be done that way but you are spending parameters managing the fact that the output changes completely with a change in view position or orientation. A observer independent model only has to manage changes of things that are actually changing in the world.
  Since you can view Gaussian splats from any POV you end up generating an output that is closer to the representation of the world instead of a projection that a single observer sees.
MattCruikshank 13 hours ago

Yeah, when you describe that, I picture Wave Function Collapse to generate a map schematic... And then a text prompt, and some style photos the designers want it to match.

notdefio 14 hours ago

This sounds like it could be a great concept for a future sequel to LSD: Dream Emulator

yard2010 14 hours ago

If I'm not mistaken that is the inspiration for one of Alt-J albums