Comment by andrepd
10 hours ago
> I tend to be of the opinion that for modern general purpose CPUs in this era, such micro-optimizations are totally unnecessary because modern CPUs are so fast that instructions are almost free.
What does this mean? Free? Optimisations are totally unnecessary because... instructions are free?
The implementation in TFA is probably on the order of 5x more efficient than a naive approach. This is time and energy as well. I don't understand what "free" means in this context.
Calendar operations are performed probably trillions of times every second across all types of computers. If you can make them more time- and energy-efficient, why wouldn't you?
If there's a problem with modern software it's too much bloat, not too much optimisation.
GP made an important point that you seemed to have missed: in modern architectures, it’s much more important to minimize memory access than to minimize instructions. They weren’t saying optimization isn’t important, they were describing how to optimize on modern systems.
If this is indeed done trillions of times a second, which I frankly have a hard time believing, then sure, it might be worth it. But on a modern CPU, focusing on an optimization like this is a poor use of developer resources. There are likely several other optimizations related to cache locality that you could find in less time than it would take to do this, and those other optimizations would probably give several orders of magnitude more improvement.
Not to mention that the final code is basically a giant WTF for anybody reading it. It will be an attractive nuisance that people will be drawn to, like moths to a flame, any time there is a bug around calendar operations.
> But on a modern CPU, focusing on an optimization like this is a poor use of developer resources.
How many people are rolling their own datetime code? This seems like a totally fine optimization to put into popular datetime libraries.
> There are likely several other optimizations related to cache locality that you could find in less time than it would take to do this, and those other optimizations would probably give several orders of magnitude more improvement.
How is cache / memory access relevant in a subroutine that performs a check on a 16bit number?
> Not to mention that the final code is basically a giant WTF for anybody reading it. It will be an attractive nuisance that people will be drawn to, like moths to a flame, any time there is a bug around calendar operations.
1: comments are your friend
2: a unit test can assert that this function is equivalent to the naive one in about half a millisecond.