Comment by schobi

13 days ago

Image and video compression has become a field that is painfully hard to enter. State of the art is complex and exhaustive, the functionality of reference encoders and comments/versions among them is really a lot.

We are well beyond where a dedicated individual can try an idea, show that it is better and expect that others can pick it up (e.g. in standardization). It is not sufficient to run a few dozen images and judge by yourself, you are expected to demonstrate the benefit integrated into the latest reference encoders and need a sponsor to join standardization efforts.

For educational purpose? Sure - do whatever you want - but any discussion "is it novel" or "is it useful for others" is moot, unfortunately.

Yes, but many of those complex state of the art thing are comprised of parts that began as ideas like this. All ideas have to start somewhere.

You don't have to have a idea that beats everything in all circumstances. It doesn't even have to be better on even one thing to potentially be useful to others. Reading this article gave me some ideas related to a supplementary autoencoder I had been working with, where encode is ( A + B ) -> Latent, decode is (A+Latent) -> B . Which is essentially the same problem If you consider the A to be the Luma, B to be the Chroma, and the compressed Chroma to be the Latent.

In that respect, I found it useful to see a perspective on an analogous problem. That makes the "Is it useful for others" not only answerable, but true.

I truly get that. That's also one of the reasons why I started from scratch once I got the idea, rather than researching all the available papers and implementations etc (because the latter is quite overwhelming, while the former took me about a week of spare time hacks).

My scope is also a bit unusual, I think, because one of the applications I'm thinking about is to "augment" luma-only codecs with chroma. One such codec is https://gitlab.com/llic/llic

But most of all, I wanted to learn.