Comment by elophanto_agent

5 hours ago

bzip2 is the compression algorithm equivalent of that one coworker who does incredible work but nobody ever talks about. meanwhile gzip gets all the credit because it's "good enough"

16 comments

elophanto_agent

kergonath 4 hours ago

Bzip2 is slow. That’s the main issue. Gzip is good enough and much faster. Also, the fact that you cannot get a valid bzip2 file by cat-ing 2 compressed files is not a deal breaker, but it is annoying.

nine_k 4 hours ago
Gzip is woefully old. Its only redeeming value is that it's already built into some old tools. Otherwise, use zstd, which is better and faster, both at compression and decompression. There's no reason to use gzip in anything new, except for backwards compatibility with something old.
- duskwuff 1 hour ago
  
  One other redeeming quality that gzip/deflate does have is that its low memory requirements (~32 KB per stream). If you're running on an embedded device, or if you're serving a ton of compressed streams at the same time, this can be a meaningful benefit.
- kergonath 4 hours ago
  
  > Otherwise, use zstd, which is better and faster
  Yes, I do. Zstd is my preferred solution nowadays. But gzip is not going anywhere as a fallback because there is a surprisingly high number of computers without a working libzstd.
duskwuff 1 hour ago

bzip2 is particularly slow because the transform it depends on (BWT2) is "intrinsically slow" - it depends on cache-unfriendly operations with long dependency chains, preventing the CPU from extracting any parallelism:
https://cbloomrants.blogspot.com/2021/03/faster-inverse-bwt....
sedatk 3 hours ago
> the fact that you cannot get a valid bzip2 file by cat-ing 2 compressed files
TIL. Now that's why gzip has a file header! But, tar.gz compresses even better, that's probably why it hasn't caught on.
- pocksuppet 3 hours ago
  
  tar packs multiple files into one. If you concatenate two gzipped files and unzip them, you just get a concatenated file.
  
  2 replies →
saidnooneever 4 hours ago
the catting issue might be more an implementation of bzip program problem than algorithm (it could expect an array of compressed files). that would only be impossible if the program cannot reason about the length of data from file header, which again is technically not something about compression algo but rather file format its carried through.
that being said, speed is important for compression so for systems like webservers etc its an easy sell ofc. very strong point (and smarter implementation in programs) for gzip
- nine_k 4 hours ago
  
  Bzip2 is great for files that are compressed once, get decompressed many times, and the size is important. A good example is a software release.
  
  1 reply →
- joecool1029 4 hours ago
  
  > the catting issue might be more an implementation of bzip program problem than algorithm (it could expect an array of compressed files). that would only be impossible if the program cannot reason about the length of data from file header, which again is technically not something about compression algo but rather file format its carried through.
  Long comment to just say: ‘I have no idea about what I’m writing about’
  These compression algorithms do not have anything to do with filesystem structure. Anyway the reason you can’t cat together parts of bzip2 but you can with zstd (and gzip) is because zstd does everything in frames and everything in those frames can be decompressed separately (so you can seek and decompress parts). Bzip2 doesn’t do that.
  So like, another place bzip2 sucks ass is working with large archives because you need to seek the entire archive before you can decompress it and it makes situations without parity data way more likely to cause dataloss of the whole archive. Really, don’t use it unless you have a super specific use case and know the tradeoffs, for the average person it was great when we would spend the time compressing to save the time sending over dialup.
stefan_ 4 hours ago
bzip and gzip are both horrible, terribly slow. Wherever I see "gz" or "bz" I immediately rip that nonsense out for zstd. There is such a thing as a right choice, and zstd is it every time.
- laurencerowe 3 hours ago
  
  lz4 can still be the right choice when decompression speed matters. It's almost twice as fast at decompression with similar compression ratios to zstd's fast setting.
  https://github.com/facebook/zstd?tab=readme-ov-file#benchmar...
- anthk 33 minutes ago
  
  pigz it's damn fast on compressing. Also, a Vax with NetBSD can run gzip. So here is it. Go try these new fancy formats on a Vax, I dare you.
  And, yes, I prefer LZMA over the obsolete Bzip2 any day, but GZIP it's like the ZIP of free formats modulo packaging, which it's the job of TAR.