Comment by sendfoods

4 months ago

Please excuse my naive question - isn't Gaussian Splatting usually used to create 3D imagery from 2D? How does providing 3D input data make sense in this context?

10 comments

sendfoods

markisus 4 months ago

Yes, the normal case uses 2D input, but it can take hours to create the scene. Using the depth channel allows me to create the scene in 33 milliseconds, from scratch, every frame. You could conceptualize this as a compromise between raw pointcloud rendering and fully precomputed Gaussian splat rendering. With pointclouds, you have a lot visual artifacts due to sparsity (low texture information, seeing "through" objects"). With Gaussian splatting, you can transfer a lot more of the 2D texture information into 3D space and render occlusion and view-dependent effects better.

Retr0id 4 months ago
How do the view-dependent effects get "discovered" from only a single source camera angle?
- markisus 4 months ago
  
  Actually there are multiple source cameras. The neural net learns to interpolate the source camera colors based on where the virtual camera is. Under the hood it's hard to say exactly what's going on in the mind of the neural net, but I think it's something like "If I'm closer to camera A, take most of the color from camera A."
  
  4 replies →
sendfoods 4 months ago

That makes things clearer, thanks!

jayd16 4 months ago

Splatting is about building a scene that supports synthetic view angles.

The depth is helpful to properly handle the parallaxing of the scene as the view angle changes. The system should then ideally "in-paint" the areas that are occluded from the input.

You can either guess the input depth from matching multiple RGB inputs or just use depth inputs along with RGB inputs if you have them. It's not fundamental to the process of building the splats either way.

ttoinou 4 months ago

Well if you have the D channel you might as well benefit from it and have better output