← Back to context

Comment by contravariant

2 years ago

Generally compression algorithm try to give structured data a distribution more similar to random data.

If any byte sequence is a correct file (unlikely, but mostly because compression algorithms try to be robust against corruption), then this is easy to reverse, you just generate a random sequence of bytes and then decompress it.

Basically you can turn a compression algorithm into a probability distribution by inserting random bytes wherever the decompression algorithm tries to read one, but sometimes not all bytes are allowed.

You can then reason about this probability distribution and see what it's properties are. Typically something with a probability of 'p' will require -log(p)/log(2) bits.