Comment by omoikane
3 days ago
Current leader of the Large Text Compression Benchmark is NNCP (compression using neural networks), also by Fabrice Bellard:
Also, nncp-2024-06-05.tar.gz is just 1180969 bytes, unlike ts_zip-2024-03-02.tar.gz (159228453 bytes, which is bigger than uncompressed enwiki8).
while impressive, it's a very specific case of compression: english text. It may be in use in many places but there are many more things to compress.
It'd be nice to have a comparison here: https://morotti.github.io/lzbench-web/?dataset=silesia/sao&m...
Doesn't this fit the Hutter Prize conditions that is mentioned in other comment here https://news.ycombinator.com/item?id=46595109
It's too slow for that. The Hutter prize is CPU only so neural network solutions (which are the most interesting IMO) are effectively excluded. You need to generate 11 574 characters per second on the CPU only for decompression, and the compression time also counts and has to be below 24 hours in total.
IIRC all the cmix submissions are using NN (and were for long time)
1 reply →