Comment by vjerancrnjak
5 hours ago
I remember being quite surprised that my implementation which uses manual stack updates is much slower than what compiler had with recursion.
Turns out, I was pushing and popping from stack on every conceptual "recursive call", but compiler figured out it can keep 2-3 recursive levels in registers and pop/push 30% of the time, had more stuff in memory than my version as well.
Even when I reduced memory read/writes to ~50% of the recursive program, kept most of the state in registers, the recursive program was faster anyway due to just using more registers than me.
I realized then that I cannot reason about the microoptimizations at all if I'm coding in a high-level language like C or C++.
Hard to predict the CPU pipeline, sometimes profile guided optimization gets me there faster than my own silliness of assuming I can reason about it.
No comments yet
Contribute on Hacker News ↗