Comment by ChadNauseam
4 months ago
This is getting downvoted but it's kind of true. Indexing collections all the time usually means you're not using iterators enough. (Although iterators become very annoying for fallible code that you want to return a Result, so sometimes it's cleaner not to use them.)
However this problem does still come up in iterator contexts. For example Iterator::take takes a usize.
An iterator works if you're sequentially visiting every item in the collection, in the order they're stored. It's terrible if you need random access, though.
Concrete example: pulling a single item out of a zip file, which supports random access, is O(1). Pulling a single item out of a *.tar.gz file, which can only be accessed by iterating it, is O(N).
History lesson for the cheap seats in the back:
Compressed tars are terrible for random access because the compression occurs after the concatenation and so knows nothing about inner file metadata, but it's good for streaming and backups. Uncompressed tars are much better for random access. (Tar was a used as a backup mechanism to tape (tape archive).)
Zips are terrible for streaming because their metadata is stored at the end, but are better for 1-pass creation and on-disk random access. (Remember that zip files and programs were created in an era of multiple floppy disk-based backups.)
When fast tar enumeration is desired, at the cost of compatibility and compression potential, it might be worth compressing files and then taring them when and if zipping alone isn't achieving enough compression and/or decompression performance. FUSE compressed tar mounting gets to be really expensive with terabyte archives.
> compressing files and then taring them
Just use squashfs if that is the functionality that you need.
1 reply →
In C++, random access iterators are a thing. Indeed, raw pointers satisfy the requirements of a random access iterator concept. Is that not the case in Rust?
While you maybe "shouldn't" be indexing collections often (which I also don't agree with, there is a reason that we have more collections then linked lists, lookup is important) even just getting the size of a collection which is often very related to business logic can be quite annoying.
For data that needs to be looked up mostly I want a hashtable. Not always, but mostly. It's rare that I want to look up something but its position in a list.