Comment by decasia

4 years ago

It would be nice if it were more common for standard library functions to include algorithmic complexity as part of the standard documentation.

Absent that, of course we can potentially read the source code and find out, but I think for the most part we tend to operate based on an informed assumption about what we imagine the algorithmic complexity of a given operation would be. Inevitably, sometimes the assumption is incorrect.

There's no way to develop software without making assumptions, some of which inevitably turn out to be incorrect, so I don't think there is any great shame in having that happen, in itself. But better docs could help us make better assumptions with less effort, at least.

43 comments

decasia

epr 4 years ago

Keeping track of algorithmic complexity would be nice as a language and/or static analysis feature. If you wanted to be exact or do it for a language with complex metaprogramming I assume it would be a nightmare to implement. Absent those complications and especially if you always reduced it to O(1), O(n), O(log(n)), etc it might not even be that difficult given the potential advantages.

groby_b 4 years ago
The difficulty here is "define n". And I don't mean that facetiously. You have a string parsing lib. It is, for reasons, quadratic over the number of strings parsed, and linear per string.
This is overall n^3, but that's meaningless because there actually isn't just one n. So, more m^2 * n. That means you can't reduce it to anything, because you want to keep both components. (Because, say, you know it will only ever be called with a single string).
But then, in the next app, this gets called and reinitialized once per file. And the routine handling files, for reasons beyond our ken, is (n lg n). We're now at k * log(k) * m^2 * n.
And so, over any sufficiently long call chain, "what is n" is the overriding question - string length, number of strings, number of files? Not "how complex is the algorithm", because you want to optimize for what's relevant to your use case.
- phkahler 4 years ago
  
  It would be a huge step to simply follow the call tree and report the depth of nested loops for each branch. You could then check what N is at each level.
  The trick is knowing where the nested loops are since they can be spread across functions.
  I had a function that scaled as N^2, but it was creating a list of that size as well. Then it called a function to remove duplicates from that list. That function was N^2, which meant the whole thing was actually N^4. And now that I think of it, those loops were not nested... I rewrote the first part to no create duplicates and deleted the quadratic deduplication. Now its N^2, but it has to be.
- epr 4 years ago
  
  I guess you're right. Keeping track of it all is required for the information to be meaningful enough. Still seems doable to me, assuming the functions are pure.
  Here's another crazy idea: keeping track of this while taking into consideration aggressive compiler optimizations.
adrianN 4 years ago
There is no general solution to deciding whether a program is O(n^k).[1] So either your static analysis won't know the answer for some programs or report a wrong bound, or report a ridiculous overestimate.
[1] https://cstheory.stackexchange.com/questions/5004/are-runtim...
- rcxdude 4 years ago
  
  So? Static analysis doesn't need to always produce an answer, only produce an answer most of the time. The question isn't whether you can do it in general for all inputs (this is not possible for basically anything you would want to know), it's whether you can do it enough of the time on the kind of code which people actually write.
- Jyaif 4 years ago
  
  Can you point me towards some source code where a human can't find the algorithmic complexity?
  
  5 replies →

beaconstudios 4 years ago

personally, I think I wouldn't even bother to check the algorithmic complexity of every external function I call. I'd just use the logical choice (like sscanf) and only consider optimising if things started to slow down and profiling the application highlighted it as a bottleneck.

TeMPOraL 4 years ago
I personally would, if it was listed in documentation. Doing stuff and profiling later is the right general approach to performance optimization. But what's better is not doing stupid mistakes in the first place, if they are trivial to avoid. To achieve that, you need to know the complexity guarantees of functions and data structures - or at least their ballpark (like, "this could be O(n) or perhaps O(n logn), definitely not worse").
This is where setting the guarantees and documenting them is useful - it allows people to trivially avoid making these performance mistakes. Prevention is better than cure, in that - as GTA Online case demonstrates - in the latter stage of product development, people may not bother fixing performance anymore.
- layer8 4 years ago
  
  It might still not help in he case of sscanf. The documentation would specify that it’s O(N) in the size of the input string, just what one would expect without deeper thought. The problem is not O(N), the problem is that N is the complete input string, not just the part being parsed. The documentation would have to include a big fat warning about that.
- beaconstudios 4 years ago
  
  Part of the issue in my mind is that big O complexity values don't (can't?) tell you the point of inflection, if they are nonlinear. Sscanf could be O(N^10) (does the rule about ignoring the coefficient also apply to the exponent?) but if it only starts to hurt your application at the point of 10tb string reads then it's still unimportant.
  I do agree that people often take "don't optimise early" as licence to not make optimisations up front that will clearly be needed (for example, implementing the buffer as a rope in a text editor), but I don't think this is the case here unless you test the function with reasonable future file sizes (something you should be doing anyway) and Sscanf profiles as a bottleneck.
  
  3 replies →
decasia 4 years ago
Yes, I absolutely think profiling and then only optimizing the actual problems is always a sound choice.
I don't check the docs for every library function I use. I'm just saying, it wouldn't hurt if, when you do read the docs for standard library functions, the algorithmic complexity was mentioned in passing.
- chihuahua 4 years ago
  
  In principle, that sounds good. But then it can happen that you profiled when N=1000 and it seems fine. Then a few years later (like in GTA), N has grown to 63,000 and it's no longer fine. It seems unlikely the developer will go back and profile it again. Also, I think the original Windows Update algorithm for figuring out which updates you needed to download started out fine, but 20 years later it turns out it's quadratic and now there are tens of thousands of updates, so it becomes almost impossible to install XP SP2 from a CD and have it update itself.
  
  4 replies →
- saagarjha 4 years ago
  
  Luckily, this happens to be one of the places where learning computer science can help!
ajkjk 4 years ago
But you'll lose so much time doing that! Realizing there's a bug and investigating it is a huge amount of work compared to never writing it in the first place.
- beaconstudios 4 years ago
  
  No, spending time optimising areas of the code that will never become bottlenecks is the waste of time.
  
  8 replies →

quelsolaar 4 years ago

There are many implementations of the C standard library and thy may not have the same complexity. In this case the code has the right complexity(linear) but to the wrong n (string length instead of parse length)

AdieuToLogic 4 years ago

> It would be nice if it were more common for standard library functions to include algorithmic complexity as part of the standard documentation.

Isn’t the point of standard library functions to define the contract and not implementation constraints? In other words, if algorithmic complexity is a significant concern, perhaps a standard library function is not a suitable choice.

adrianN 4 years ago

Algorithmic complexity is not an implementation detail, it's part of the constraints on how you can use the function.

zachrip 4 years ago

This is a cool idea..I'm sure it's been done before automatically in code, but I'd love to see this in my editor. It sounds similar to the bundle-size extension that you can use that will annotate how big a package that you're importing is in node.

chii 4 years ago

> standard library functions to include algorithmic complexity as part of the standard documentation.

it would be acceptable to have a graph of the input size vs time, and this graph could be autogenerated using a testbed that is also used by unit testing! two birds one stone!

jacobajit 4 years ago

I don't think we could come up with a standard x axis bounds for such a graph, since n=1000, or 1,000,000 may not be zoomed out enough to showcase the behavior approaching infinity.