← Back to context

Comment by cubefox

2 hours ago

It's strange that in the age of AI, denoisers are still so bad. It's basically impossible to photograph snowing in the winter because the denoiser will remove 90% of the snowflakes. Machine learning models are already used for denoising ray traced graphics with substantially improved results, so why is it that cameras aren't using ML denoisers yet? At least for still images. Or do they perhaps already use them, only the quality is still bad for unknown reasons?

(As someone who worked closely with pathtracing renderers and de-noisers, I think I can answer this :) )

It's mostly because in the VFX/CG space for ray tracing/path tracing de-noisers, they almost always rely on extra outputs/AOVs of things like 'albedo' (diffuse reflectance), normal / world position, etc, to help guide them in many cases.

So they often can 'cheat' a bit, and know where the edges of things are (because say the object ID AOV changes - minus pixel filtering, which complicates things a bit).

They can also 'cheat' in other ways, by mixing back in some of the diffuse texture detail that the denoiser might have removed from the 'albedo' AOV channel.

Cameras don't really have anything to guide them, so they have to guess. And often, they seem to use very primitive methods like bi-lateral filters (or at least things which look very similar), to try and guide them, but it doesn't work very well.

Portrait cameras on phones can use depth sensors a bit to help if the camera has them, but for things like hair strands, it doesn't really work, and is mostly useful for fake-depth-of-field depth-based blurring.

  • Yeah, but surely ML models would at least work better than analytic algorithms. After all, when looking at a noisy picture, our brain is pretty good at distinguishing detail from noise, so it's not clear to me why an ML model couldn't have denoising performance similar to the human brain, even if it doesn't match the "cheating" denoisers used in ray tracing.

    • It would probably help you to compare what you can do on a phone vs what you can do with desktop software (Lightroom/Photoshop, DxO, Topaz, CaptureOne, etc). It's generally quite good, with the exception of challenging liminal areas (e.g. hair, foliage).

      Fwiw, Topaz -- which I have a license for but essentially never use -- has pretty incredible denoising & upsizing features (for both photo & video), but to get the optimal quality output you offload the processing to their cloud infra (and buy credits from them to pay for it). It's roughly the equivalent of a SWE using a local LLM that's good enough" vs a frontier model that's SOTA but requires a consumption-based subscription.

      1 reply →

Are we still talking about smartphone cameras? If yes, apps already heavily rely on much more advanced computational photography than your average photo editor can do, including but not limited to ML denoisers. The problem is that such apps are typically optimized for the "average case" and are as automated as possible, so they either remove snow, rain, and haze intentionally, or lose small moving particles as the result of stacking. That said, snow and rain are usually possible to capture in the apps that attempt to determine the scene type or have specific modes.