Comment by epr
4 years ago
Keeping track of algorithmic complexity would be nice as a language and/or static analysis feature. If you wanted to be exact or do it for a language with complex metaprogramming I assume it would be a nightmare to implement. Absent those complications and especially if you always reduced it to O(1), O(n), O(log(n)), etc it might not even be that difficult given the potential advantages.
The difficulty here is "define n". And I don't mean that facetiously. You have a string parsing lib. It is, for reasons, quadratic over the number of strings parsed, and linear per string.
This is overall n^3, but that's meaningless because there actually isn't just one n. So, more m^2 * n. That means you can't reduce it to anything, because you want to keep both components. (Because, say, you know it will only ever be called with a single string).
But then, in the next app, this gets called and reinitialized once per file. And the routine handling files, for reasons beyond our ken, is (n lg n). We're now at k * log(k) * m^2 * n.
And so, over any sufficiently long call chain, "what is n" is the overriding question - string length, number of strings, number of files? Not "how complex is the algorithm", because you want to optimize for what's relevant to your use case.
It would be a huge step to simply follow the call tree and report the depth of nested loops for each branch. You could then check what N is at each level.
The trick is knowing where the nested loops are since they can be spread across functions.
I had a function that scaled as N^2, but it was creating a list of that size as well. Then it called a function to remove duplicates from that list. That function was N^2, which meant the whole thing was actually N^4. And now that I think of it, those loops were not nested... I rewrote the first part to no create duplicates and deleted the quadratic deduplication. Now its N^2, but it has to be.
I guess you're right. Keeping track of it all is required for the information to be meaningful enough. Still seems doable to me, assuming the functions are pure.
Here's another crazy idea: keeping track of this while taking into consideration aggressive compiler optimizations.
There is no general solution to deciding whether a program is O(n^k).[1] So either your static analysis won't know the answer for some programs or report a wrong bound, or report a ridiculous overestimate.
[1] https://cstheory.stackexchange.com/questions/5004/are-runtim...
So? Static analysis doesn't need to always produce an answer, only produce an answer most of the time. The question isn't whether you can do it in general for all inputs (this is not possible for basically anything you would want to know), it's whether you can do it enough of the time on the kind of code which people actually write.
Can you point me towards some source code where a human can't find the algorithmic complexity?
Humans can't tell you whether this program will run forever on any particular (positive integer) input, or whether all inputs terminate.
2 replies →
It's not hard to implement the construction in the proof. Generally you'll encounter problems in the wild in any interpreter. Similarly you can encode many open mathematical problems into simple programs where finding runtime bounds is equal to solving the problem. The Collatz Conjecture for example.
Once you leave primitive recursive functions[0], reasoning quickly becomes very non-trivial.
[0]: https://en.wikipedia.org/wiki/Primitive_recursive_function