Of course similar text compresses more efficiently, but NNs don’t work with compressed (varying-size) representations, they work with vector representations which happen to be close in similarity space
they work with compressed representations, you take an arbitrary information with varying entropy into a fized size vector representation, that's a compression.
Well, yeah, but the training process means that the compression is both lossy and much less efficient than a standard compression method like gzip. You could even train your NN on its ability to losslessly recall, but we generally call that "overfitting" in the lingo.
The way you'd do compression using a NN, is using the NN to predict the probability of the next symbol, and feeding that into an arithmetic coder to produce a compressed representation. This process is lossless, and better prediction quality directly translates into better compression.
Why does this conclusion follow?
Of course similar text compresses more efficiently, but NNs don’t work with compressed (varying-size) representations, they work with vector representations which happen to be close in similarity space
they work with compressed representations, you take an arbitrary information with varying entropy into a fized size vector representation, that's a compression.
That's like saying hashing is compression because the output is always x bits. You see what I mean, right?
3 replies →
Well, yeah, but the training process means that the compression is both lossy and much less efficient than a standard compression method like gzip. You could even train your NN on its ability to losslessly recall, but we generally call that "overfitting" in the lingo.
The way you'd do compression using a NN, is using the NN to predict the probability of the next symbol, and feeding that into an arithmetic coder to produce a compressed representation. This process is lossless, and better prediction quality directly translates into better compression.
yes, biggest mindfuck is autoencoders. literally brute-force train a lossy compressor.