← Back to context

Comment by bunderbunder

4 months ago

An iterator works if you're sequentially visiting every item in the collection, in the order they're stored. It's terrible if you need random access, though.

Concrete example: pulling a single item out of a zip file, which supports random access, is O(1). Pulling a single item out of a *.tar.gz file, which can only be accessed by iterating it, is O(N).

History lesson for the cheap seats in the back:

Compressed tars are terrible for random access because the compression occurs after the concatenation and so knows nothing about inner file metadata, but it's good for streaming and backups. Uncompressed tars are much better for random access. (Tar was a used as a backup mechanism to tape (tape archive).)

Zips are terrible for streaming because their metadata is stored at the end, but are better for 1-pass creation and on-disk random access. (Remember that zip files and programs were created in an era of multiple floppy disk-based backups.)

When fast tar enumeration is desired, at the cost of compatibility and compression potential, it might be worth compressing files and then taring them when and if zipping alone isn't achieving enough compression and/or decompression performance. FUSE compressed tar mounting gets to be really expensive with terabyte archives.

In C++, random access iterators are a thing. Indeed, raw pointers satisfy the requirements of a random access iterator concept. Is that not the case in Rust?