Comment by dspillett
7 months ago
Compressing a sequence of any single character should give almost identical results length-wise (perhaps not exactly identical, but the difference will be vanishingly small).
For example, with gzip using default options:
me@here:~$ pv /dev/zero -s 10M -S | gzip -c | wc -c
10.0MiB 0:00:00 [ 122MiB/s] [=============================>] 100%
10208
me@here:~$ pv /dev/zero -s 100M -S | gzip -c | wc -c
100MiB 0:00:00 [ 134MiB/s] [=============================>] 100%
101791
me@here:~$ pv /dev/zero -s 1G -S | gzip -c | wc -c
1.00GiB 0:00:07 [ 135MiB/s] [=============================>] 100%
1042069
me@here:~$ pv /dev/zero -s 10M -S | tr "\000" "\141" | gzip -c | wc -c
10.0MiB 0:00:00 [ 109MiB/s] [=============================>] 100%
10209
me@here:~$ pv /dev/zero -s 100M -S | tr "\000" "\141" | gzip -c | wc -c
100MiB 0:00:00 [ 118MiB/s] [=============================>] 100%
101792
me@here:~$ pv /dev/zero -s 1G -S | tr "\000" "\141" | gzip -c | wc -c
1.00GiB 0:00:07 [ 129MiB/s] [=============================>] 100%
1042071
Two bytes difference for a 1GiB sequence of “aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa…” (\141) compared to a sequence of \000.
No comments yet
Contribute on Hacker News ↗