Comment by leobuskin

6 months ago

What about a specialized dict for FASTA? Shouldn't it increase ZSTD compression significantly?

Yes I'd expect a dict-based approach to do better here. That's probably how it should be done. But --long is compelling for me because using it requires almost no effort, it's still very fast, and yet it can dramatically improve compression ratio.

  • From what I've read (although I haven't tested and I can't find my source from when I read it), dictionaries aren't very useful when dataset is big, and just by using '--long' you can cover that improvement.

    Have any of you tested it?

    • I don’t think the size of content matters, it’s all about patterns (and their repetitiveness) within, and FASTA is a great target, if I understand the format correctly

      2 replies →