← Back to context

Comment by amiga386

2 hours ago

I suspect he means the choices of putting the central directory headers at the end of the file, as well as having local file headers as you read through the file, which allows for ambiguity.

Alternatively, he could mean that, for the purposes of archiving, ZIP is very far behind the state of the art (no solid compression, old algorithms, small windows, file size limits without the ZIP64 extensions, and so on, most of which are not relevant to using ZIP as a container format)

Thanks, makes sense. Are the headers even an issue when using ZIP as a container? Are there superior alternatives in practice?

I’ve reached for ZIP for application containers because it’s really easy, not because of design choices that affect me. Typically the compression is a convenient byproduct but not a requirement, and file size limits could be an issue, perhaps, but isn’t something I’ve ever hit when using ZIP for application data. File size limits is something I’ve hit when trying to archive lots of files.

Using ZIP for build pipelines that produce a large number of small files is handy since it’s often faster than direct file I/O, even on SSDs. In the past was much faster than spinning media, especially DVDs. These days in Python you can unzip to RAM and treat it like a small file system - and for that file size limits aren’t an issue in practice.