Comment by perching_aix

5 months ago

That post immediately came to my mind too! Do you maybe have a comparison to share with respect to the specialized compressor mentioned in the OP there?

> Grace Blackwell’s 2.6Tbp 661k dataset is a classic choice for benchmarking methods in microbial genomics. (...) Karel Břinda’s specialist MiniPhy approach takes this dataset from 2.46TiB to just 27GiB (CR: 91) by clustering and compressing similar genomes together.