Comment by nunobrito

5 months ago

Well, well. Kind of surprised to see this really good tool that should have been made available a longer time ago since the approach is quite sound.

When the data container is understood, the deduplication is far more efficient because now it is targeted.

Licensed as BSD-3-Clause, solid C++ implementation, well documented.

Will be looking forward to see new developments as more file formats are contributed.

Specialization for file formats is not novel (e.g. 7-Zip uses BCJ2 prefiltering to convert x86 opcodes from absolute to relative JMP instructions), nor is embedding specialized decoder bytecode in the archive (e.g. ZPAQ did this and won a lot of Matt Mahoney's benchmarks) but i think OpenZL's execution here, along with the data description and training system, is really fantastic.

  • Thanks, I've enjoyed reading more about ZPAQ but their main focus seems to be versioning (which is quite a useful feature too, will try it later) but they don't include specialized compression per context.

    Like you mention, the expandability is quite something. In a few years we might see a very capable compressor.