Comment by ycombinatrix

7 months ago

The different byte values likely won't compress as well as all 0s unless they are a repeating pattern of blocks.

An alternative might be to use Brotli which has a static dictionary. Maybe that can be used to achieve a high compression ratio.

I meant that all of the byte values would be the same (so they would still be repeating), but a different value than zero. However, Brotli could be another idea if the client supports it.

Compressing a sequence of any single character should give almost identical results length-wise (perhaps not exactly identical, but the difference will be vanishingly small).

For example, with gzip using default options:

    me@here:~$ pv /dev/zero -s 10M -S | gzip -c | wc -c                    
    10.0MiB 0:00:00 [ 122MiB/s] [=============================>] 100%      
    10208                                                                  
    me@here:~$ pv /dev/zero -s 100M -S | gzip -c | wc -c                   
     100MiB 0:00:00 [ 134MiB/s] [=============================>] 100%      
    101791                                                                 
    me@here:~$ pv /dev/zero -s 1G -S | gzip -c | wc -c                     
    1.00GiB 0:00:07 [ 135MiB/s] [=============================>] 100%      
    1042069                                                                
    me@here:~$ pv /dev/zero -s 10M -S | tr "\000" "\141" | gzip -c | wc -c 
    10.0MiB 0:00:00 [ 109MiB/s] [=============================>] 100%      
    10209                                                                  
    me@here:~$ pv /dev/zero -s 100M -S | tr "\000" "\141" | gzip -c | wc -c
     100MiB 0:00:00 [ 118MiB/s] [=============================>] 100%      
    101792                                                                 
    me@here:~$ pv /dev/zero -s 1G -S | tr "\000" "\141" | gzip -c | wc -c  
    1.00GiB 0:00:07 [ 129MiB/s] [=============================>] 100%      
    1042071

Two bytes difference for a 1GiB sequence of “aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa…” (\141) compared to a sequence of \000.