Comment by zzo38computer
7 months ago
I also had the idea of zip bomb to confuse badly behaved scrapers (and I have mentioned it before to some other people, although I did not implemented it). However, maybe instead of 0x00, you might use a different byte value.
I had other ideas too, but I don't know how well some of them will work (they might depend on what bots they are).
The different byte values likely won't compress as well as all 0s unless they are a repeating pattern of blocks.
An alternative might be to use Brotli which has a static dictionary. Maybe that can be used to achieve a high compression ratio.
I meant that all of the byte values would be the same (so they would still be repeating), but a different value than zero. However, Brotli could be another idea if the client supports it.
Compressing a sequence of any single character should give almost identical results length-wise (perhaps not exactly identical, but the difference will be vanishingly small).
For example, with gzip using default options:
Two bytes difference for a 1GiB sequence of “aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa…” (\141) compared to a sequence of \000.