← Back to context

Comment by cooper12

7 years ago

Similar as in the same approach, or as in "apply neural networks to all the things"? Because if it's the former, this approach was very specifically tailored to human speech, taking into account how much it can compress/interpolate qualities like pitch and the spectral envelope. That's far too specific to apply to video.

As for the latter, you'd have to perhaps feed Google Scholar the right incantations or ask someone with knowledge. As far as I know, video codecs already have a huge bag of tricks they use (for example the B-frames borrowed in this post). Even then, the key points in this codec were that firstly it's meant for use at very low bitrates, where existing codecs break down, and then secondly it's a vocoder, so it's converting audio to an intermediate form and resynthesizing it. That kind of lossiness is acceptable for audio, but I'm not sure how it would work acceptably for video.

I should have been more specific. I meant that instead of compressing video to minimise pixel difference, minimise feature difference instead.