Comment by meindnoch

1 month ago

1. Create a point cloud from a scene (either via lidar, or via photogrammetry from multiple images)

2. Replace each point of the point cloud with a fuzzy ellipsoid, that has a bunch of parameters for its position + size + orientation + view-dependent color (via spherical harmonics up to some low order)

3. If you render these ellipsoids using a differentiable renderer, then you can subtract the resulting image from the ground truth (i.e. your original photos), and calculate the partial derivatives of the error with respect to each of the millions of ellipsoid parameters that you fed into the renderer.

4. Now you can run gradient descent using the differentiable renderer, which makes your fuzzy ellipsoids converge to something closely reproducing the ground truth images (from multiple angles).

5. Since the ellipsoids started at the 3D point cloud's positions, the 3D structure of the scene will likely be preserved during gradient descent, thus the resulting scene will support novel camera angles with plausible-looking results.

18 comments

meindnoch

klondike_klive 1 month ago

You... you must have been quite some 5 year old.

efskap 1 month ago

ELI5 has meant friendly simplified explanations (not responses aimed at literal five-year-olds) since forever, at least on the subreddit where the concept originated.
Now, perhaps referring to differentiability isn't layperson-accessible, but this is HN after all. I found it to be the perfect degree of simplification personally.
SchemaLoad 1 month ago
Some things would be literally impossible to properly explain to a 5 year old.
- zapzupnz 1 month ago
  
  If one actually tried to explain to a five year old, they can use things like analogy, simile, metaphor, and other forms of rhetoric. This was just a straight-up technical explanation.
np_tedious 1 month ago
Lol. Def not for 5 year olds but it's about exactly what I needed
How about this:
Take a lot of pictures of a scene from different angles, do some crazy math, and then you can later pretend to zoom and pan the camera around however you want
- KeplerBoy 1 month ago
  
  sure, but does that explanation really help anyone. Imo it might scare people off actually diving into things, the math isn't too crazy.
  
  3 replies →

alok-g 1 month ago

Thanks.

How hard is it to handle cases where the starting positions of ellipsoids in 3D is not correct (being too off). How common is such a scenario with the state of the art? E.g., if having only a stereoscopic image pair, the correspondences are often not accurate.

Thanks.

make3 1 month ago

I assume that the differentiable renderer is only given its position and viewing angle at any one time (in order to be able to generalize to new viewing angles)?

Is it a fully connected NN?

meindnoch 1 month ago

No. There are no neural networks here. The renderer is just a function that takes a bunch of ellipsoid parameters and outputs a bunch of pixels. You render the scene, then subtract the ground truth pixels from the result, and sum the squared differences to get the total error. Then you ask the question "how would the error change if the X position of ellipsoid #1 was changed slightly?" (then repeat for all ellipsoid parameters, not just the X position, and all ellipsoids, not just ellipsoid #1). In other words, compute the partial derivative of the error with respect to each ellipsoid parameter. This gives you a gradient, that you can use to adjust the ellipsoids to decrease the error (i.e. get closer to the ground truth image).

renewiltord 1 month ago

Great explanation/simplification. Top quality contribution.

cpt_sobel 1 month ago

And what about the "mature enough" part? How has it changed / progressed recently?

corysama 1 month ago

The field is advancing rapidly. New research papers are being published daily for a few years now. The best news feed I've found on the topic is
https://radiancefields.com/
https://x.com/RadianceFields alt: https://xcancel.com/RadianceFields

pleurotus 1 month ago

Thanks for the explanation!

chrisjj 1 month ago

Or: Matrix bullet time with more viewpoints and less quality.