← Back to context

Comment by vintermann

6 months ago

In fixed intervals? I'm not so sure about that. Generally if the intervals are fixed, you shouldn't need delimiters, you know where one thing begins and another ends anyway.

Anyway, BWT-based compressors like Bzip2 do a good job on "repetition, but with random differences". Better than LZ-based compressors. However, they are not competitive on speed, and it's gotten relatively worse as computers got faster since the Burrows-Wheeler transform can't be parallelized very well and is inherently cache-unfriendly.

All kinds of data - block justified text, database files with fixed row size tables, even HTTP chunked encoding tends to have blocks of same size with same delimiters,...

I really don't see how better supporting this "second order repetition" feature in the encoding would cause such a big problem. LZ variants already track repeating strings.