Comment by joaorj
10 years ago
Does this mean that lots of 3:30 minutes 1080p videos could be compressed into 4kb?
EDIT: We can be generous and say 40kb for sake of adding more colours / etc.
10 years ago
Does this mean that lots of 3:30 minutes 1080p videos could be compressed into 4kb?
EDIT: We can be generous and say 40kb for sake of adding more colours / etc.
This is actually an insightful question.
The practical answer is no. There is an unimaginable amount of possible 3:30 minute vidoes—far more than the number of possible 4kb or even 40kb files.
To be fair, most of those possible vidoes are just noise. We don't have to be able to compress those because people don't care if one video of noise is different from another. We also don't have to reconstruct the video perfectly: as long as it looks more or less the same, the audience is happy. (This is called "lossy compression".)
But even with these caveats, there is no realistic method for compressing realistic 3:30 minute videos that well on a computer. We likely can't do all that much more than current compression algorithms without a different set of tradeoffs. (Like being better at some videos but worse at others.)
That said, a big part of how compression works is by relying on information already present when decompressing. This demo relies on having a particular kind of chip with certain capabilities (ie a CPU and a GPU) and presumably some standard library functions... etc.
How well could we "compress" videos if we had more information available when decompressing? Here's a fun thought experiment: what if we had a model of a human mind? We could then feed in a pretty sparse description and have the model fill in the details in a natural intuitive way. It would be very lossy, but the results would be compelling.
And you know what? That's a decent mental model of how speech works! If you just look at information content, spoken words are not very dense. But if I describe a scene you can imagine it almost as if you're seeing a video. This works because we both have the same sort of brain as well as shared experiences and intentions.
You can think of speech as incredibly effective—but also rather lossy—compression.
It could be very useful to deliberately pursue SUPER lossy compression. As long as no one can really tell based on the end result, it doesn't really matter.
For example, if you can only tell something was lossy by directly comparing two instances of the same video during playback, then that's probably good enough in most situations.
It occurred to me that we could compress the hell out of written works by translating them into some super dense language, and ultimately only retain the basics of the meaning/concepts/some of the writing style. Then can re-translate that back to whatever language we want to read it in.
For compressing pictures or videos, there could be some similar translation to a much more compact representation. Would probably rely on ML heavily though.
4K of English text is a couple of pages of a novel, enough to describe a character and a situation, maybe an interaction. A good writer can conjure up a whole world in 4K... but probably not a description of an arbitrary 3 and a half minutes of activity.
That being said, speech is far from the best compression we can ever hope of achieving.
Nice insight you brought with the CPU and the standard libraries being a relevant factor, hadn't thought of that.
Your thought experiment sounds more like a "codec" than a procedural generation. I guess it is an arbitrary line given that we are using CPU, etc. But the bigger the decompressing "model" the further away from true 4k compression we are.
Take a look at https://en.wikipedia.org/wiki/Kolmogorov_complexity
The Kolmogorov Complexity of a video (or any other data) is the size of the shortest program which outputs that video then halts. This 4k executable is similar in spirit, but also follows strict rules about efficiency: Kolmogorov complexity places no time limits on that shortest program, whereas this program must output pixels fast enough to make the video realtime.
Why would you think that? https://en.wikipedia.org/wiki/Procedural_generation https://en.wikipedia.org/wiki/Demoscene https://en.wikipedia.org/wiki/Real-time_computer_graphics
Sorry, I thought it was obvious, but the question is: Could procedural generation be used to achieve amazing compression rates given a currently impossible to code algorithm?
No, only very specific videos, like this particular one. The art is in finding a pretty video that you can render in 4kb, not making a pretty video and then reducing it to 4kb. The latter would most likely be impossible.
"39. Re graphics: A picture is worth 10K words - but only those to describe the picture. Hardly any sets of 10K words can be adequately described with pictures."
It's the pigeonhole principle; there are only a few long videos possibly encodable as short programs because there are only a few short programs in the first place. To get compression performance, one has to target an ever smaller subset of possible videos, which eventually starts becoming an AI-complete problem.
4 replies →
The difference between procedural generation and a video is similar to the difference between raster and vector graphics. Demoscene intros like this are more like your computer giving a live performance from scratch than playing a movie. Ideas like video compression don't really apply. They create 3D models and textures from simple math functions and filters, make a world from them, add more math functions for camera movements, and play some synthesized music that's more akin to MIDI than MP3 (to put it simply).
I recently began making a function that can output the 2D lines of the walls of a house, with windows and different shapes (L, S, T), and inside walls that are generated with points and NESW directions. It was pretty fun and challenging, but now I have to move to 3D to make this base line become a level with windows and door.
The only thing I have to give this function, is the height/width ratio, some other ratio that define how large "corner holes" are in the LST configuration, the amount and relative position of windows and door, the starting point and NSEW direction for inside walls, and with all that, I could create a house of a story building with an inside. Of course it's not finished yet, and there isn't furniture of details, but you see that in theory, you can use procedural generation as a compression tool for human-designed structures, that no machine learning algorithm or autoencoder could really achieve.
If you associate this kind of algorithm into a well made openstreetmap database (think vector tiles which are used for GPS software), you could also recreate the whole world in 3D, with enough details to make a game that would not require that much disk memory. Recreating the roads, fences, parks, rivers, vegetation, elevation etc is difficult because it require a lot of tuning and geometry tricks, but it's very cheap in term of cpu cycle and disk.
The folks at outerra have begun making an actual software that lets you browse the entire planet in 3D. You can zoom in real time from space to 1cm. They don't have cities yet though, but they are planning for it. I want to make a game using such ambitious ideas, but it's not easy...
Look up algorithmic information theory. To use a poor analogy, it is to procedural generation what information theory is to compression.
No, only very specific videos, like this particular one.
Well, it's not compressed, it's generated. You could generate an endless video with less code, but it would most likely be uninteresting. Scene demos are interesting because it's art and direction and music generated from algorithms rather than creating those things and compressing them efficiently.
But, yes, at some level there is an idea of a dna seed and a process to create something much more profound, we as humanity haven't come close to cracking that, though.
I suspect that if at all possible to have an algorithm that can generate the seeds plus the process to expand them, then that algorithm would take orders of magnitude longer to run then there would be practical in any meaningful time scale.
Not visuals, but along a similar vein, random number generators with high dimensionality and equidistribution can be coerced into generating very specific output, given enough exploration of the output space.
For example, and output of all zeros, or the source for a a random number generator itself, or a zipped archive of a work of Shakespeare.
It's fun to think about anyway.
http://www.pcg-random.org/party-tricks.html
But how many colours do we speak of now?