← Back to context

Comment by zahlman

7 hours ago

> It's unfair because it's a different algorithm with fundamentally different memory characteristics. A fairer comparison would be to stream the file in C++ as well and maintain internal state for the count.

The C++ code is still building a tally by incrementing keys of a hash map one at a time, and then dumping (reversed) key/value pairs out into a list and sorting. The file is small and the Python code is GCing the `line` each time through the outer loop. At any rate it seems like a big chunk of the Python memory usage is just constant (sort of; stuff also gets lazily loaded) overhead of the Python runtime, so.