Comment by dspillett

7 months ago

Compressing a sequence of any single character should give almost identical results length-wise (perhaps not exactly identical, but the difference will be vanishingly small).

For example, with gzip using default options:

    me@here:~$ pv /dev/zero -s 10M -S | gzip -c | wc -c                    
    10.0MiB 0:00:00 [ 122MiB/s] [=============================>] 100%      
    10208                                                                  
    me@here:~$ pv /dev/zero -s 100M -S | gzip -c | wc -c                   
     100MiB 0:00:00 [ 134MiB/s] [=============================>] 100%      
    101791                                                                 
    me@here:~$ pv /dev/zero -s 1G -S | gzip -c | wc -c                     
    1.00GiB 0:00:07 [ 135MiB/s] [=============================>] 100%      
    1042069                                                                
    me@here:~$ pv /dev/zero -s 10M -S | tr "\000" "\141" | gzip -c | wc -c 
    10.0MiB 0:00:00 [ 109MiB/s] [=============================>] 100%      
    10209                                                                  
    me@here:~$ pv /dev/zero -s 100M -S | tr "\000" "\141" | gzip -c | wc -c
     100MiB 0:00:00 [ 118MiB/s] [=============================>] 100%      
    101792                                                                 
    me@here:~$ pv /dev/zero -s 1G -S | tr "\000" "\141" | gzip -c | wc -c  
    1.00GiB 0:00:07 [ 129MiB/s] [=============================>] 100%      
    1042071

Two bytes difference for a 1GiB sequence of “aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa…” (\141) compared to a sequence of \000.

0 comments

dspillett

No comments yet

Contribute on Hacker News ↗