Comment by closeparen
5 years ago
I think this is a perfect example of “algorithms and data structures emphasis is overblown.” Real world performance problems don’t look like LeetCode Hard, they look like doing obviously stupid, wasteful work in tight loops.
... that's the exact opposite of what I took from this.
The obviously stupid, wasteful work is at heart an algorithmic problem. And it cropped up even in the simplest of data structures. A constant amount of wasteful work often isn't a problem even in tight loops. A linear amount of wasted work, per loop, absolutely is.
It's not something that requires deep algorithms/data structures knowledge, is the point. Knowing how to invert a binary tree won't move the needle on whether you can spot this kind of problem. Knowing how to operate a profiler is a lot more useful.
True that it's rare that you need to pull out obscure algorithms or data structures, but in many projects you'll be _constantly_ constructing, composing, and walking data structures, and it only takes one or two places that are accidentally quadratic to make something that should take milliseconds take minutes.
The mindset of constantly considering the big-O category of the code you're writing and reviewing pays off big. And neglecting it costs big as well.
Except that you need to test your software and if you see performance problems, profile them to identify the cause. It's not like you have one single chance to get everything right.
The later in development a problem is caught, the more expensive it is. The farther it gets along the pipeline of concept -> prototype -> testing -> commit -> production, the longer it's going to take to notice, repro, identify the responsible code, and fix.
It's true that you don't just have one shot to get it right, but you can't afford to be littering the codebase with accidentally quadratic algorithms.
I fairly regularly encounter code that performed all right when it was written, then something went from X0 cases to X000 cases and now this bit of N^2 code is taking minutes when it should take milliseconds.
People complain about Big-O once they reach the end of its usefulness. Your algorithm is O(n) or O(n log n) but it is still too slow.
And trying to optimize them gets you stink eye at code review time. Someone quotes Knuth, they replace your fast 200 lines with slow-as-molasses 10 lines and head to the bar.
Unfortunately this. Or they will say "don't optimize it until it proves to be slow in production" - at which point it is too dangerous to change it.
And here what matters is not your programming skills, it’s your profiling skills. Every dev writes code that’s not the most optimized piece from the start, hell we even say “don’t optimise prematurely”. But good devs know how to profile and flamegraph their app, not leetcode their app.
actually, "don't optimize prematurely" is a poor advice. just recently I was doing a good review that had the same issue where they were counting the size of an array in a loop, when stuff was being added to the array in the loop too. obvious solution was to track the length and
changed to
This is clearly optimization, but it's not premature. The original might just pass code review, but when it wrecks havoc, the amount of time it will cost will not be worth it, jira tickets, figuring out why the damn thing is slower, then having to recreate it in dev, fixing it, reopening another pull request, review, deploy, etc. Sometimes "optimizing prematurely" is the right thing to do if it doesn't cost much time to do or overly completely the initial solution. Of course, this depends on the language, some languages will track the length of the array so checking the size is o(1), but not all languages do, so checking the length can be expensive, knowing the implementation detail matters.
I'm not sure I would prefer the second version in a code review. I find the first version is conceptually nicer because it's easy to see that you will always get the correct count. In the second version you have to enforce that invariant yourself and future code changes could break it. If this is premature optimization or not depends on the size of the array, number of loop iterations and how often that procedure is called. If that's an optimization you decide to do, I think it would be nice to extract this into an "ArrayWithLength" data structure that encapsulates the invariant.
4 replies →
With these things, I have always had the hope that an optimizing compiler would catch this. I think it is an allowed optimization if the count function is considered `const` in C or C++ at least.
leetcode style thinking will allow you to spot obviously stupid wasteful work in tight loops.
Exactly - though to add a little nuance to your post, it’s about having a million loops in a 10M line code base and exactly one of them is operating maximally slowly. So preventing the loop from entering the code base is tough - finding it is key.