← Back to context

Comment by meindnoch

9 hours ago

1. Create a point cloud from a scene (either via lidar, or via photogrammetry from multiple images)

2. Replace each point of the point cloud with a fuzzy ellipsoid, that has a bunch of parameters for its position + size + orientation + view-dependent color (via spherical harmonics up to some low order)

3. If you render these ellipsoids using a differentiable renderer, then you can subtract the resulting image from the ground truth (i.e. your original photos), and calculate the partial derivatives of the error with respect to each of the millions of ellipsoid parameters that you fed into the renderer.

4. Now you can run gradient descent using the differentiable renderer, which makes your fuzzy ellipsoids converge to something closely reproducing the ground truth images (from multiple angles).

5. Since the ellipsoids started at the 3D point cloud's positions, the 3D structure of the scene will likely be preserved during gradient descent, thus the resulting scene will support novel camera angles with plausible-looking results.

You... you must have been quite some 5 year old.

  • ELI5 has meant friendly simplified explanations (not responses aimed at literal five-year-olds) since forever, at least on the subreddit where the concept originated.

    Now, perhaps referring to differentiability isn't layperson-accessible, but this is HN after all. I found it to be the perfect degree of simplification personally.

  • Lol. Def not for 5 year olds but it's about exactly what I needed

    How about this:

    Take a lot of pictures of a scene from different angles, do some crazy math, and then you can later pretend to zoom and pan the camera around however you want

  • Some things would be literally impossible to properly explain to a 5 year old.

    • If one actually tried to explain to a five year old, they can use things like analogy, simile, metaphor, and other forms of rhetoric. This was just a straight-up technical explanation.

Thanks.

How hard is it to handle cases where the starting positions of ellipsoids in 3D is not correct (being too off). How common is such a scenario with the state of the art? E.g., if having only a stereoscopic image pair, the correspondences are often not accurate.

Thanks.