← Back to context

Comment by Filligree

8 years ago

One useful trick is that, for gzip, d(z(x+y)) = d(z(x) + z(y)).

So you don't need to compress the entire terabyte.

I'd expect that to provide a lower compression, though it may not matter given the additional followup gzips.

The compression finally finished after 3h (on an old MBP), "dd if=/dev/zero bs=1m count=1m | gzip | gzip | gzip" yields a bit under 10k (10082 bytes), and adding a 4th gzip yields a bit under 4k (4004 bytes). The 5th gzip starts increasing the size of the archive.

  • It does, though I once used that trick to create a file containing more "Hello, World" lines than there are atoms in the universe. By, hmm, quite a large factor. It probably isn't a serious concern.

    It still fit on a floppy disk. :)

That's true for the content stream but not gzip files themselves, which do have a minimal header.

  • Which gunzip will overlook / handle correctly, so concatenating the compressed files does work.