Comment by lokar
8 hours ago
And if you know in advance that a function will be in the critical path, and it needs to perform some operation on N items, and N will be large, it’s not premature to consider the speed of that loop.
8 hours ago
And if you know in advance that a function will be in the critical path, and it needs to perform some operation on N items, and N will be large, it’s not premature to consider the speed of that loop.
Another thought: many (most?) of these "rules" were before widespread distributed computing. I don't think Knuth had in mind a loop that is reading from a database at 100ms each time.
I've seen people write some really head shaking code that makes remote calls in a loop that don't actually depend on each other. I wonder to what extend they are thinking "don't bother with optimization / speed for now"
First, I agree with what you're saying.
But second, I'd remove "optimization" from considering here. The code you're describing isn't slow, it's bad code that also happens to be slow. Don't write bad code, ever, if you can knowingly avoid it.
It's OK to write good, clear, slow code when correctness and understandability is more important that optimizing that particular bit. It's not OK to write boneheaded code.
(Exception: After you've written the working program, it turns out that you have all the information to make the query once in one part of the broader program, but don't have all the information to make it a second time until flow reaches another, decoupled part of the program. It may be the lesser evil to do that than rearrange the entire thing to pass all the necessary state around, although you're making a deal with the devil and pinky swearing never to add a 3rd call, then a 4th, then a 5th, then...)
If you really have a loop that is reading from a database at 100ms each time, that's not because of not having optimized it prematurely, that's just stupid.
Reminds me of this quote which I recently found and like:
> look, I'm sorry, but the rule is simple: if you made something 2x faster, you might have done something smart if you made something 100x faster, you definitely just stopped doing something stupid
https://x.com/rygorous/status/1271296834439282690
Got it. What about initiating a 800mb image on a CPU limited virtual machine that THEN hits a database, before responding to a user request on a 300ms roundtrip? I think we need a new word to describe the average experience, stupidity doesn't fit.
And yet... :)
I think there is just a current (I've seen it mostly in Jr engineers) that you should just ignore any aspect of performance until "later"
1 reply →
Just remember Rob Pike's 1st rule: don't assume where bottlenecks will occur, but verify it.
I've worked on optimizing modern slow code. Once you optimize a few bottlenecks it turns out it's very hard to optimize because the rest of the time is spread out over the whole code without any small bottlenecks and it's all written in a slow language with no thought for performance.
From my understanding you still need to care care about the algorithms and architecture. If N is sufficiently large, you should pick O(N) algorithm over O(N^2). But usually there is a tradeoff, simple code (or hiding something behind some abstraction) might be easier to understand and maintain, but it might work slower with large input data and vice versa. I would rather write code that will be easier to optimize if there is some bottleneck than to optimize it overagressivelly. Also, different code needs different kind of optimization. Sometimes the code might be more IO heavy (disk / DB or network), for this type of code, the IO operation planning and caching is more critical than the optimization of the raw CPU time. Sometimes the input is too small to have any significant performance effects, and, what's paradoxical, choosing smarter algorithms might even hurt performance (alongside the maintanability). For example, for 10 - 100 items a simple linear scan in an array might be faster than using a O(log n) binary search tree. It's also well known that faster executable code (regardless of being hand written, machine generated, high level or machine code) usually has larger size (mostly because it's more "unrolled", duplicated and more complex when advanced algorithms are used). If you optimize the speed everywhere, the binary size tends to increase, causing more cache misses, which might hurt the performance more than improve. This is why some profiling is often needed for large software than simply passing O3.