Comment by im3w1l

6 years ago

No, having an arbitrarily complex dictionary or compressor is not counted as cheating. The model is basically that you are allowed to grab as many harddrives as you want before going out into the wilderness. From then on all your news arrive over a low-bandwidth carrier pigeon, and you have to decompress the transmissions with what you remembered to bring.

Counted by whom? What benchmark follows the model you’re describing? Does any real-world compressor use dictionaries anywhere near this big?

If you can bring the complete benchmark corpus (or substantial subsets of it) “into the wilderness”, the benchmark isn’t worth running. It’s not a compressor, it’s a database with stable keys. A Library of Congress LCCN code uniquely identifies the complete text of any published book, but it doesn’t contain a compressed copy of that book.