← Back to context

Comment by vidarh

2 years ago

I was really mainly responding to the point of JPEG aiming for indistinguishable. Point being that for a lot of purposes we're fine with, and might even be happier with, very different tradeoffs than those JPEG makes.

Going specifically to AI, we do agree that the lack of constraint means they're not compressors in and of themselves. The training compresses information, but that does not make them compressors. Learning and compressing information is, however, at least in some respects very similar. A key part of the LZW family of compression, for example, is applying heuristics to build a dictionary of bit streams (terms) learned from the input.

AI models can potentially eventually be used at the base of a compression because the models encode a lot of information that can potentially be referenced in space-efficient ways.

E.g. if I have a picture of a sunset, and can find a way of getting Stable Diffusion or similar to generate an image of a sunset that is similar enough from a description smaller than the output image, then I have a compressor and decompressor.

Ignoring the runtime cost and that bringing that down to levels where it'd actually produce a benefit, depending on how close the output it, it may be a totally useless algorithm leading to images that are way too far from the input, or it might turn out pretty good. But the tradeoffs would also be very different from JPEG. For some uses I might be happy with a quite different-looking sunset as long as it's "close enough" and high quality even at very high compression ratios. E.g. "A sunset over the horizon. Photo taken from a beach. A fishing boat in the water" fed to [1] produced a pretty nice sunset. Couple that with a seed to make it deterministic, and I might be happy with that as a compression of an image of a quite different sunset. For other uses I'd much prefer JPEG artefacts and something that is clearly the same sunset. For "real" use of it for compression you'd want someone to research ways of guiding it to produce something much closer to the input (maybe heavily downscaling the original image and using that as the starting point coupled with a description; maybe a set of steps including instructions for infilling etc). I think finding the limits of what you can achieve with trying to use these models to reproduce a specific input with the most minimal possible input would make for fascinating research.

[1] https://huggingface.co/stabilityai/stable-diffusion-2?text=A...