Comment by dehrmann

2 months ago

Not trolling, but I'd bet something that's augmented with generative AI. Not to the level of describing scenes with words, but context-aware interpolation.

17 comments

dehrmann

mort96 2 months ago

I don't want my video decoder inventing details which aren't there. I much rather want obvious compression artifacts than a codec where the "compression artifacts" look like perfectly realistic, high-quality hallucinated details.

doopp 2 months ago

A codec that uses AI isn't necessarily going to be using it to synthesize content. It could use it for things like improving rate-distortion optimizations and early-skip heuristics.
Modern video codecs are complex beasts. It's not as simple as "take a macroblock and find some motion vectors that minimize residual below a certain threshold, else code as intra-block". They have hundreds of mutually-exclusive techniques to compress a certain area of the frame. Determining which technique will require the smallest residual is done with fast early-skip heuristics that often make the wrong decision. The official manual for x265, the H.265/HEVC encoder (https://x265.readthedocs.io/en/master/cli.html), has literally hundreds of options, almost all about tuning these myriad of heuristics for your particular input.
AI can be used to enhance things like early-skip heuristics. "This block looks like it'll benefit from a really-detailed motion search in this particular area" or "we'll save bits if we bypass the DCT step and quantize the block directly" or "this frame should definitely be a B-frame". Encoders already use heuristics to do this (brute forcing all possible decisions to find which is optimal is too slow), but they don't always make the best decision. An AI could be used to improve that.
Now when I say AI, I'm not talking about massive, multi-billion weight monstrosities that synthesize nonsense, but extremely simple neural networks with a few thousand weights. The popular Opus codec uses a simple NN to estimate whether or not a frame of audio is speech or music, and uses that determination to decide whether to use their speech-optimized algorithm (SILK) or their music-optimized algorithm (CELT) to encode that particular frame. It's a short read but a very good one: https://jmvalin.ca/opus/opus-1.3/
This could be extended to video encoders without being used for interpolation where they would be liable to synthesize things that aren't there, DLSS-style.
cubefox 2 months ago
In case of many textures (grass, sand, hair, skin etc) it makes little difference whether the high frequency details are reproduced exactly or hallucinated. E.g. it doesn't matter whether the 1262nd blade of grass from the left side is bending to the left or to the right.
- mort96 2 months ago
  
  And in the case of many others, it makes a very significant difference. And a codec doesn't have enough information to know.
  Imagine a criminal investigation. A witness happened to take a video as the perpetrator did the crime. In the video, you can clearly see a recognizable detail on the perpetrator's body in high quality; a birthmark perhaps. This rules out the main suspect -- but can we trust that the birthmark actually exists and isn't hallucinated? Would a non-AI codec have just showed a clearly compression-artifact-looking blob of pixels which can't be determined one way or the other? Or would a non-AI codec have contained actual image data of the birth mark in sufficient detail?
  Using AI to introduce realistic-looking details where there was none before (which is what your proposed AI codec inherently does) should never happen automatically.
  
  9 replies →

km3r 2 months ago

https://blogs.nvidia.com/blog/rtx-video-super-resolution/

We already have some of the stepping stones for this. But honestly much better for upscaling poor quality streams vs just gives things a weird feeling when it is a better quality stream.

cubefox 2 months ago

Neural codecs are indeed the future of audio and video compression. A lot of people / organizations are working on them and they are close to being practical. E.g. https://arxiv.org/abs/2502.20762

afiori 2 months ago

AI embeddings can be seen as a very advanced form of lossy compression

randall 2 months ago

for sure. macroblock hinting seems like a good place for research.