You've rediscovered a state-of-the-art technique, currently used by JPEG XL, AV1, and the HEVC range extensions. It's called "chroma from luma" or "cross-component prediction".
This technique has a weakness: the most interesting and high-entropy data shared between the luma and chroma planes is their edge geometry. To suppress block artefacts near edges, you need to code an approximation of the edge contours. This is the purpose of your quadtree structure.
In a codec which compresses both luma and chroma, you can re-use the luma quadtree as a chroma quadtree, but the quadtree itself is not the main cost here. For each block touched by a particular edge, you're redundantly coding that edge's chroma slope value, `(chroma_inside - chroma_outside) / (luma_inside - luma_outside)`. Small blocks can tolerate a lower-precision slope, but it's a general rule that coding many imprecise values is more expensive than coding a few precise values, so this strategy costs a lot of bits.
JPEG XL compensates for this problem by representing the local chroma-from-luma slope as a low-resolution 2D image, which is then recursively compressed as a lossless JPEG XL image. This is similar to your idea of using PNG-like compression (delta prediction, followed by DEFLATE).
Of course, since you're capable of rediscovering the state of the art, you're also capable of improving on it :-)
One idea would be to write a function which, given a block of luma pixels, can detect when the block contains two discrete luma shades (e.g. "30% of these pixels have a luminance value close to 0.8, 65% have a luminance value close to 0.5, and the remaining 5% seem to be anti-aliased edge pixels"). If you run an identical shade-detection algorithm in both the encoder and decoder, you can then code chroma information separately for each side of the edge. Because this would reduce edge artefacts, it might enable you to make your quadtree leaf nodes much larger, reducing your overall data rate.
Image and video compression has become a field that is painfully hard to enter.
State of the art is complex and exhaustive, the functionality of reference encoders and comments/versions among them is really a lot.
We are well beyond where a dedicated individual can try an idea, show that it is better and expect that others can pick it up (e.g. in standardization). It is not sufficient to run a few dozen images and judge by yourself, you are expected to demonstrate the benefit integrated into the latest reference encoders and need a sponsor to join standardization efforts.
For educational purpose? Sure - do whatever you want - but any discussion "is it novel" or "is it useful for others" is moot, unfortunately.
Yes, but many of those complex state of the art thing a comprised of parts that began as ideas like this.
You don't have to have a idea that beats everything in all circumstances. It doesn't even have to be better on even one thing to potentially be useful to others. Reading this article gave me some ideas related to a supplementary autoencoder I had been working with, where encode is ( A + B ) -> Latent, decode is (A+Latent) -> B . Which is essentially the same problem If you consider the A to be the Luma, B to be the Chroma, and the compressed Chroma to be the Latent.
In that respect, I found it useful to see a perspective on an analogous problem. That makes the "Is it useful for others" not only answerable, but true.
AV1 has the CfL (chroma from luma) prediction tool. h.266 has CCLM (cross component linear model) and AV2 has improved CfL.
When you see the AV2 version, it’s very similar to h.266, which is more advanced than AV1 CfL: they are using filters that cover up to 6 luma pixels to determine the chroma baseline.
The algorithm described in this blog post is pretty close to what AV1 CfL does.
Neat. I cant comment on your work (unfamiliar territory).
But, I looked at all the images and there is a consistent loss of red/yellow and gain of blue/green in the 0.5bpp versions. It's subtle, but noticeable on my mid-grade monitor.
You've rediscovered a state-of-the-art technique, currently used by JPEG XL, AV1, and the HEVC range extensions. It's called "chroma from luma" or "cross-component prediction".
This technique has a weakness: the most interesting and high-entropy data shared between the luma and chroma planes is their edge geometry. To suppress block artefacts near edges, you need to code an approximation of the edge contours. This is the purpose of your quadtree structure.
In a codec which compresses both luma and chroma, you can re-use the luma quadtree as a chroma quadtree, but the quadtree itself is not the main cost here. For each block touched by a particular edge, you're redundantly coding that edge's chroma slope value, `(chroma_inside - chroma_outside) / (luma_inside - luma_outside)`. Small blocks can tolerate a lower-precision slope, but it's a general rule that coding many imprecise values is more expensive than coding a few precise values, so this strategy costs a lot of bits.
JPEG XL compensates for this problem by representing the local chroma-from-luma slope as a low-resolution 2D image, which is then recursively compressed as a lossless JPEG XL image. This is similar to your idea of using PNG-like compression (delta prediction, followed by DEFLATE).
Of course, since you're capable of rediscovering the state of the art, you're also capable of improving on it :-)
One idea would be to write a function which, given a block of luma pixels, can detect when the block contains two discrete luma shades (e.g. "30% of these pixels have a luminance value close to 0.8, 65% have a luminance value close to 0.5, and the remaining 5% seem to be anti-aliased edge pixels"). If you run an identical shade-detection algorithm in both the encoder and decoder, you can then code chroma information separately for each side of the edge. Because this would reduce edge artefacts, it might enable you to make your quadtree leaf nodes much larger, reducing your overall data rate.
Image and video compression has become a field that is painfully hard to enter. State of the art is complex and exhaustive, the functionality of reference encoders and comments/versions among them is really a lot.
We are well beyond where a dedicated individual can try an idea, show that it is better and expect that others can pick it up (e.g. in standardization). It is not sufficient to run a few dozen images and judge by yourself, you are expected to demonstrate the benefit integrated into the latest reference encoders and need a sponsor to join standardization efforts.
For educational purpose? Sure - do whatever you want - but any discussion "is it novel" or "is it useful for others" is moot, unfortunately.
Yes, but many of those complex state of the art thing a comprised of parts that began as ideas like this.
You don't have to have a idea that beats everything in all circumstances. It doesn't even have to be better on even one thing to potentially be useful to others. Reading this article gave me some ideas related to a supplementary autoencoder I had been working with, where encode is ( A + B ) -> Latent, decode is (A+Latent) -> B . Which is essentially the same problem If you consider the A to be the Luma, B to be the Chroma, and the compressed Chroma to be the Latent.
In that respect, I found it useful to see a perspective on an analogous problem. That makes the "Is it useful for others" not only answerable, but true.
You talk about an algorithm. However, you never visited any of the practical implementations of Chroma from Luma, which is used in Production.
It is part of the AV1/2 video codec; for instance, it has been widely adopted too since 2018. https://arxiv.org/pdf/1711.03951
So do IETF early draft of the idea. https://datatracker.ietf.org/doc/draft-midtskogen-netvc-chro...
Give a read of the work if not:)
JPEG XL also incorporates Chroma from Luma. It's a standard coding tool in modern codecs.
AV1 has the CfL (chroma from luma) prediction tool. h.266 has CCLM (cross component linear model) and AV2 has improved CfL.
When you see the AV2 version, it’s very similar to h.266, which is more advanced than AV1 CfL: they are using filters that cover up to 6 luma pixels to determine the chroma baseline.
The algorithm described in this blog post is pretty close to what AV1 CfL does.
Neat. I cant comment on your work (unfamiliar territory).
But, I looked at all the images and there is a consistent loss of red/yellow and gain of blue/green in the 0.5bpp versions. It's subtle, but noticeable on my mid-grade monitor.
Yep, it has some slight color bleeding and also the 0.5 bpp ones have more detail in certain areas, like the fire breathing guy's forehead.
[dead]