Comment by Leptonmaniac
2 days ago
Can someone ELI5 what this does? I read the abstract and tried to find differences in the provided examples, but I don't understand (and don't see) what the "photorealistic" part is.
2 days ago
Can someone ELI5 what this does? I read the abstract and tried to find differences in the provided examples, but I don't understand (and don't see) what the "photorealistic" part is.
Imagine history documentaries where they take an old photo and free objects from the background and move them round giving the illusion of parallax movement. This software does that in less than a second, creating a 3D model that can be accurately moved (or the camera for that matter) in your video editor. It's not new, but this one is fast and "sharp".
Gaussian splashing is pretty awesome.
Oh man. I never thought about how Ken Burns might use that.
Already you sometimes see where manually cut out a foreground person from the background and enlarge them a little bit and create a multi-layer 3D effect, but it's super-primitive and I find it gimmicky.
Bringing actual 3D to old photographs as the camera slowly pans or rotates slightly feels like it could be done really tastefully and well.
What are free objects?
The "free" in this case is a verb. The objects are freed from the background.
8 replies →
Takes a 2D image and allows you to simulate moving the angle of the camera with correct-ish parallax effect and proper subject isolation (seems to be able to handle multiple subjects in the same scene as well)
I guess this is what they use for the portrait mode effects.
It turns a single photo into a rough 3D scene so you can slightly move the camera and see new, realistic views. "Photorealistic" means it preserves real textures and lighting instead of a flat depth effect. Similar behavior can be seen with Apple's Spatial Scene feature in the Photos app: https://files.catbox.moe/93w7rw.mov
From a single picture it infers a hidden 3D representation, from which you can produce photorealistic images from slightly different vantage points (novel views).
There's nothing "hidden" about the 3d represenation. It's a point cloud (in meters) with colors, and a guess at the the "camera" that produced it.
(I am oversimplifying).
"Hidden" or "latent" in a context like this just means variables that the algo is trying to infer because it doesn't have direct access to them.
Hidden in the sense of neural net layers. I mean intermediary representation.
1 reply →
Basically depth estimation to split the scene into various planes, and then inpainting to work out the areas in the obscured parts of the planes, and then the free movement of them to allow for parallax. Think of 2D side scrolling games that have various different background depths to give illusion of motion and depth.
Apple does something similar right now in their photos app, generating spatial views from 2d photos, where parallax is visible by moving your phone. This paper’s technique seems to produce them faster. They also use this same tech in their Vision Pro headset to generate unique views per eye, likewise on spatialized images from Photos.
It makes your picture 3D. The "photorealistic" part is "it's better than these other ways".
Black Mirror episode portraying what this could do: https://youtu.be/XJIq_Dy--VA?t=14. If Apple ran SHARP on this photo and compared it to the show, that would be incredible.
Or if you prefer Blade Runner: https://youtu.be/qHepKd38pr0?t=107
One more example from Star Trek Into Darkness https://youtu.be/p7Y4nXTANRQ?t=61
I was thinking Enemy of the State (1998) https://www.youtube.com/watch?v=3EwZQddc3kY
Agreed, this is a terrible presentation. The paper abstract is bordering on word salad, the demo images are meaningless and don’t show any clear difference to the previous SotA, the introduction talks about “nearby” views while the images appear to show zooming in, etc.