Comment by tikhonj

10 years ago

This is actually an insightful question.

The practical answer is no. There is an unimaginable amount of possible 3:30 minute vidoes—far more than the number of possible 4kb or even 40kb files.

To be fair, most of those possible vidoes are just noise. We don't have to be able to compress those because people don't care if one video of noise is different from another. We also don't have to reconstruct the video perfectly: as long as it looks more or less the same, the audience is happy. (This is called "lossy compression".)

But even with these caveats, there is no realistic method for compressing realistic 3:30 minute videos that well on a computer. We likely can't do all that much more than current compression algorithms without a different set of tradeoffs. (Like being better at some videos but worse at others.)

That said, a big part of how compression works is by relying on information already present when decompressing. This demo relies on having a particular kind of chip with certain capabilities (ie a CPU and a GPU) and presumably some standard library functions... etc.

How well could we "compress" videos if we had more information available when decompressing? Here's a fun thought experiment: what if we had a model of a human mind? We could then feed in a pretty sparse description and have the model fill in the details in a natural intuitive way. It would be very lossy, but the results would be compelling.

And you know what? That's a decent mental model of how speech works! If you just look at information content, spoken words are not very dense. But if I describe a scene you can imagine it almost as if you're seeing a video. This works because we both have the same sort of brain as well as shared experiences and intentions.

You can think of speech as incredibly effective—but also rather lossy—compression.

It could be very useful to deliberately pursue SUPER lossy compression. As long as no one can really tell based on the end result, it doesn't really matter.

For example, if you can only tell something was lossy by directly comparing two instances of the same video during playback, then that's probably good enough in most situations.

It occurred to me that we could compress the hell out of written works by translating them into some super dense language, and ultimately only retain the basics of the meaning/concepts/some of the writing style. Then can re-translate that back to whatever language we want to read it in.

For compressing pictures or videos, there could be some similar translation to a much more compact representation. Would probably rely on ML heavily though.

4K of English text is a couple of pages of a novel, enough to describe a character and a situation, maybe an interaction. A good writer can conjure up a whole world in 4K... but probably not a description of an arbitrary 3 and a half minutes of activity.

  • That being said, speech is far from the best compression we can ever hope of achieving.

Nice insight you brought with the CPU and the standard libraries being a relevant factor, hadn't thought of that.

Your thought experiment sounds more like a "codec" than a procedural generation. I guess it is an arbitrary line given that we are using CPU, etc. But the bigger the decompressing "model" the further away from true 4k compression we are.