← Back to context

Comment by beaconstudios

4 years ago

personally, I think I wouldn't even bother to check the algorithmic complexity of every external function I call. I'd just use the logical choice (like sscanf) and only consider optimising if things started to slow down and profiling the application highlighted it as a bottleneck.

I personally would, if it was listed in documentation. Doing stuff and profiling later is the right general approach to performance optimization. But what's better is not doing stupid mistakes in the first place, if they are trivial to avoid. To achieve that, you need to know the complexity guarantees of functions and data structures - or at least their ballpark (like, "this could be O(n) or perhaps O(n logn), definitely not worse").

This is where setting the guarantees and documenting them is useful - it allows people to trivially avoid making these performance mistakes. Prevention is better than cure, in that - as GTA Online case demonstrates - in the latter stage of product development, people may not bother fixing performance anymore.

  • It might still not help in he case of sscanf. The documentation would specify that it’s O(N) in the size of the input string, just what one would expect without deeper thought. The problem is not O(N), the problem is that N is the complete input string, not just the part being parsed. The documentation would have to include a big fat warning about that.

  • Part of the issue in my mind is that big O complexity values don't (can't?) tell you the point of inflection, if they are nonlinear. Sscanf could be O(N^10) (does the rule about ignoring the coefficient also apply to the exponent?) but if it only starts to hurt your application at the point of 10tb string reads then it's still unimportant.

    I do agree that people often take "don't optimise early" as licence to not make optimisations up front that will clearly be needed (for example, implementing the buffer as a rope in a text editor), but I don't think this is the case here unless you test the function with reasonable future file sizes (something you should be doing anyway) and Sscanf profiles as a bottleneck.

    • I agree. And I'd personally probably trip over it too, if I used C functions much (I use C++ standard library equivalents, and I'm going to recheck string parsing code in our project now, because we may have that same problem, just with different API!). strlen() in a loop is something that can sneak up on you by virtue of having too many layers of abstraction - scanf() family being one of them.

      But what I'm saying is, documenting big O complexity is useful, even if imperfect (and if the function has "tricky" complexity, the conditions where it gets bad should be documented too!).

      > Sscanf could be O(N^10) (does the rule about ignoring the coefficient also apply to the exponent?) but if it only starts to hurt your application at the point of 10tb string reads then it's still unimportant.

      Sure, but then applications grow, functions get reused. Iff it's trivial to spot and replace O(N^10) Sscanf with a O(n) alternative, I'd want to know the complexity and do the replacement immediately - otherwise, you may discover two years later that the company is investing employee-years into horizontally scaling your application as it's slow under workload, where fixing that one thing would let it run that same workload on a laptop, on a single core.

      (This is not a joke, there are "big data" stories like this.)

      1 reply →

    • The important part of the documentation would be that N is the length of the entire string, not just the subset of data that needs to be processed. The actual complexity isn't relevant in this case.

Yes, I absolutely think profiling and then only optimizing the actual problems is always a sound choice.

I don't check the docs for every library function I use. I'm just saying, it wouldn't hurt if, when you do read the docs for standard library functions, the algorithmic complexity was mentioned in passing.

  • In principle, that sounds good. But then it can happen that you profiled when N=1000 and it seems fine. Then a few years later (like in GTA), N has grown to 63,000 and it's no longer fine. It seems unlikely the developer will go back and profile it again. Also, I think the original Windows Update algorithm for figuring out which updates you needed to download started out fine, but 20 years later it turns out it's quadratic and now there are tens of thousands of updates, so it becomes almost impossible to install XP SP2 from a CD and have it update itself.

But you'll lose so much time doing that! Realizing there's a bug and investigating it is a huge amount of work compared to never writing it in the first place.

  • No, spending time optimising areas of the code that will never become bottlenecks is the waste of time.

    • But if you know the performance of an algorithm up front, you don't have to spend any time optimizing it in the first place. You just know what to do, because you know the performance.

      For instance: suppose you are building a CRUD app on a SQL database. Do you (a) add indexes for important queries as you go? or (b) ignore indexes and later profile and see what queries are slow. No, of course you just make the indexes in the first place. Having to do the latter would mean that instead of having a fast app out of the gate, you have an app that gets slower over time and requires additional dev time to debug and improve. Profiling and fixing performance problems is a massive waste of everyone's time if the problem could have been dodged when the code was being written.

      It's different if the optimization is significant engineering effort. Then, yes, put them off till it's needed. But most aren't, in my experience: most optimizations are totally simple, in hindsight, and the code should have been written that way in the first place.

      3 replies →