Comment by writebetterc
6 hours ago
So it went from parsing at 25MiB/s to 115MiB/s. I feel like 115MiB/s is very slow for a Rust program, I wonder what it's up to that makes it so slow now. No diss to the author, good speedup, and it might be good enough for them.
115 MiB/s is something like 20 to 30 cycles per byte on a laptop, 50 on a desktop. That’s definitely quite slow as far as a CPU’s capacity to ingest bytes, but unfortunately about as fast as it gets for scalar (machine) code that does meaningful work per byte. There may be another factor of 2 or 3 to be had somewhere, or there may not be. If you want to go meaningfully faster, as in at least at the speed of your disk[1], you need to stop doing work per byte and start vectorizing. For parsers, that is possible but hard.
[1] https://www.youtube.com/watch?v=p6X8BGSrR9w
A quick rule of thumb is that one or two bytes per peak clock cycle per core or so (not unlike an old 8 bit or 16 bit machine!) is the worst case for memory bandwidth when running highly multithreaded workloads that heavily access main RAM outside cache. So there's a lot of gain to be had before memory bandwidth is truly saturated, and even then one can plausibly move to GPU-based compute and speed things up further. (Unified memory+HBM may potentially add a 2x or 3x multiplier to this basic figure, but either way it's in the ballpark.)
"for Rust program"?
Isnt it more about the grammar than the prog lang?
The grammar matters also, of course. A pure Python program is going to be much slower than the equivalent Rust program, just because CPython is so slow.
I don't know if this does semantic analysis of the program as well.