Comment by tombert
18 days ago
I've been watching those Kaze Emanuar videos on his N64 development, and it's always so weird to me when "doing the expensive computation again" is cheaper than "using the precomputed value". I'm not disputing it, he seems to have done a lot of research and testing confirming the results and I have no reason to think he's lying, but it's so utterly counter-intuitive to me.
I haven't looked into N64, but the speed of CPUs has been growing faster than the speed of RAM for decades. I'm not sure when exactly that started, probably some time in the late 80s or early 90s, since that is about when PCs started getting cache memory I believe.
I wonder if a breakpoint was out-of-order execution. Many computations would use some values from memory plus other that could be computed, and out-of-order execution would allow the latter to proceed while waiting on memory for the former. That would improve utilization and be a 'win' even if the recomputation in isolation would be no faster than the memory load.
The N64 was just really weirdly designed: they went with an overpowered CPU for bragging rights, and bet on the wrong RAM horse, Rambus.
I used to develop for the N64 and I can confirm that it is true. It is crazy how much faster the CPU is compared with not-in-cache RAM access.
Optimizing for RAM access instead of CPU instruction speed can make your code magnitudes faster.
Doing maths is extremely fast. You need a lot of maths to get to the same amount of time as a single memory access that is not cached in L1 or L2.
And you need to burn even more cycles before you’ve amortized the cost of using a cache line that could have benefitted some other work.