Comment by goodside

6 years ago

In theory, yes, but a constant 5 GB penalty is enormous in practice — orders of magnitude bigger than anything used in the real world. Brotli’s static dictionary is only 122 KB, and covers many natural and programming languages beyond just English.

> a constant 5 GB penalty is enormous in practice > orders of magnitude bigger than anything used in the real world

    [root@archlinux mail]# pwd
    /mail
    [root@archlinux mail]# du -skh .
    68G     .

And this is a tiny personal mailserver. There's loads of applications where a 5GB penalty* is well below the amount of text you're looking at (wikipedia springs to mind since they're in the same kind of size range for text.)

  • Obviously bodies of text bigger than 5GB exist. I was talking about static compressor dictionaries, which are tiny. Hence mentioning Brotli’s 122KB dictionary. Static dictionaries are an optimization to improve the compression of very small text files — they aren’t useful for compressing large files, because once you have lots of data you can build a more efficient dictionary at compression time and include it in the compressed stream.

Not to mention the hardware inefficiencies of a 5 GB dictionary on naive hardware. Poor caches. :(