Comment by pixelesque
4 hours ago
(As someone who worked closely with pathtracing renderers and de-noisers, I think I can answer this :) )
It's mostly because in the VFX/CG space for ray tracing/path tracing de-noisers, they almost always rely on extra outputs/AOVs of things like 'albedo' (diffuse reflectance), normal / world position, etc, to help guide them in many cases.
So they often can 'cheat' a bit, and know where the edges of things are (because say the object ID AOV changes - minus pixel filtering, which complicates things a bit).
They can also 'cheat' in other ways, by mixing back in some of the diffuse texture detail that the denoiser might have removed from the 'albedo' AOV channel.
Cameras don't really have anything to guide them, so they have to guess. And often, they seem to use very primitive methods like bi-lateral filters (or at least things which look very similar), to try and guide them, but it doesn't work very well.
Portrait cameras on phones can use depth sensors a bit to help if the camera has them, but for things like hair strands, it doesn't really work, and is mostly useful for fake-depth-of-field depth-based blurring.
Yeah, but surely ML models would at least work better than analytic algorithms. After all, when looking at a noisy picture, our brain is pretty good at distinguishing detail from noise, so it's not clear to me why an ML model couldn't have denoising performance similar to the human brain, even if it doesn't match the "cheating" denoisers used in ray tracing.
It would probably help you to compare what you can do on a phone vs what you can do with desktop software (Lightroom/Photoshop, DxO, Topaz, CaptureOne, etc). It's generally quite good, with the exception of challenging liminal areas (e.g. hair, foliage).
Fwiw, Topaz -- which I have a license for but essentially never use -- has pretty incredible denoising & upsizing features (for both photo & video), but to get the optimal quality output you offload the processing to their cloud infra (and buy credits from them to pay for it). It's roughly the equivalent of a SWE using a local LLM that's good enough" vs a frontier model that's SOTA but requires a consumption-based subscription.
Interesting, so it seems to be an issue with heavy compute or RAM requirements.
2 replies →
> Yeah, but surely ML models would at least work better than analytic algorithms.
Bear in mind, a modern flagship phone doesn't just need to take photos - it also needs to record 4k video at 60fps.
Can't go too hard with the ML when you've only got 1/60th of a second to do it.
Would you be happy if ML detects that you are trying to capture the moon and replaced your shot with a memorized picture of the moon with match ing size and orientation ?
Autoencoders can presumably do this except that it would be operating at levels of patches rather than the entire Moon.
No would be unhappy :(
1 reply →