Comment by rtyu1120

2 hours ago

Bit unrelated rant but I'm still not sure why ZIP has been adopted as an Application File Format rather than anything else. It is a remanent of a DOS era with questionable choices, why would you pick it over anything else?

- archiver format to stow multiple files in one; your actual files (in your choice of format(s)) go inside

- files can be individually extracted, in any order, from the archive

- thousands of implementations available, in every language and every architecture. no more than 32KiB RAM needed for decompression

- absolutely no possibility of patent challenges

  • Also architecturally suitable for the common case of collecting heterogeneous files in existing and new formats into a single file, as opposed to designing a database schema or a complex container structure from scratch.

    Any multi-file archive format would do, but ZIP is very portable and random access.

If all you need is a bag of named blobs and you just want quick reasonable compression supported across all platforms, why not?

If you don't need any table/relational data and are always happy to rewrite the entire file on every save, ZIP is a perfectly fine choice.

It's easier than e.g. a SQLite file with a bunch of individually gzipped blobs.

ZIP isn’t an application format, it’s a container, no? You store files with any format in a .zip, and that’s what applications do - they read files with other formats out of the .zip. What are your goals; what else would you pick, and why? What are the questionable choices you refer to?

  • I suspect he means the choices of putting the central directory headers at the end of the file, as well as having local file headers as you read through the file, which allows for ambiguity.

    Alternatively, he could mean that, for the purposes of archiving, ZIP is very far behind the state of the art (no solid compression, old algorithms, small windows, file size limits without the ZIP64 extensions, and so on, most of which are not relevant to using ZIP as a container format)

AMD/Xilinx Vivado uses ZIP format to compress design checkpoints. They just give them a .dcp extension though.

Because Windows can view and extract them out of the box without installing any additional applications. If it supported anything better out of the box I'd guess people would use that instead.

  • "The operating system makes it easy to mess with" doesn't seem like a particularly useful property for application file formats.

    • It was, back when software development was run by hackers and not suits and security people. Easy access was a feature for users, too; back in those days, software was a tool that worked on data, it didn't try to own the data.

It works well enough. What could, for instance, epubs gain by having another base format instead?

I think most format use "gzip" instead of "zip".

  • gzip and tar+gzip aren't good options for application data compared to zip.

    zip is used for Java jar files, OpenOffice documents and other cases.

    The benefit is that individual files in the archive can be acces individually. A tgz file is a stream which can (without extra trickery) only be extracted from begin to end with no seeking to a specific record and no way to easily replace a single file without rewriting everything.

    tgz is good enough for distributing packages which are supposed to be extracted at once (a software distribution)

  • gzip is not an archive container. You're thinking of .tar.gz which is a "tape archive" format which is compressed using gzip. Zip is by itself both a compression and an archive format, and is what documents like epub or docx use

    • You are right, but other documents like .ggb (GeoGebra files) or .mbz (Moodle backups) use the .tar.gz method. I even wrote programs to opened them, make a few tweaks and save the new version in another compatible file.