Comment by BenFrantzDale
4 years ago
All of the C++ algorithms list complexity guarantees, I believe. This saga stunned me to learn that C doesn’t seem to do this.
4 years ago
All of the C++ algorithms list complexity guarantees, I believe. This saga stunned me to learn that C doesn’t seem to do this.
It's easy to forget that the original C standards were largely codifying existing practice during an era when using gets() [1] was existing practice. The world wasn't quite ready for Ada, I guess. Best-laid plans of mice and men etc. etc..
Also, keep an eye out for "amortized" complexity. This does have a legitimately rigorous definition, but for latency-bound paths it can practically amount to "O(whatever), except for the particular invocations that are far, far worse under unspecified conditions".
[1] https://www.man7.org/linux/man-pages/man3/gets.3.html#BUGS
It's also easy to forget that C was competing mainly with assembly, while C++ competed with managed languages. The early C programmer ethos, especially among library authors, was much more along the lines of "look at the generated object code if you want to know what it's doing" while modern practice leans more towards "read the documentation for complexity guarantees". I'm not saying that worse documentation leads to better programmers, but I'm not not saying that either. Practices change, standards change.
Good documentation and inspecting the compiled bytecode are both good ways of finding out about performance characteristics of certain features. The problem starts when people rely on assumptions ("sscanf should be fast because it's widely used") or performance folklore ("localizing every function you'll ever use makes your Lua code faster"), because those tend to either be completely wrong or lack very important context.
3 replies →
It makes me chuckle when hash maps are stated to be O(1) insertions. Which is true, in respect to the number of items in the map, assuming the map doesn't need resizing and there isn't a hash collision... but it's generally not true in respect to the key length. (I think most implementations are O(ln), where l is the length of the key and n is the number of inserted items, assuming the hash function is O(l) - the _amortised_ runtime would be O(l))
> assuming the map doesn't need resizing
This isn't a big difficulty; it's still amortized O(1).
> and there isn't a hash collision
This is a real difficulty, unless you allow map resizing. Luckily, we do.
> but it's generally not true in respect to the key length.
OK, but in most cases the key length is constant, making anything that depends on the key length O(1) by definition.
25 replies →
But, isn't the key length a constant and we are back to O(1)? Ok, in theory you could exhaust all possible keys of a certain length and proceed with longer keys. It would give us what? O(ln(n))?
1 reply →
...I fully plan to use "O(whatever)". Not sure for what.
But, yes. (naive) Quicksort's amortized complexity being O(nlogn), but its O(n^2) on already sorted data, is all I ever needed to learn to take away that lesson. When sorting already sorted data is worse than sorting randomized data, it's a quick realization that "amortized cost" = "read the fine print".
Quicksort as O(n log n) is not amortized complexity, but average runtime for random data.
10 replies →
I (sadly) have to resort to use "O(scary)" too often for my taste.
From: https://stackoverflow.com/a/185576/368409
Are there complexity guarantees for std::istream::operator>>?
In the standard there's things like "exactly N operations", but not seeing stuff for `istream`. There's like... an explanation of how things should work and I imagine you can derive complexity from it, but I think `istream` is a bit special since you're talking about this wrapper for (potentially) an arbitrary input source.
[0]: https://eel.is/c++draft/alg.move#15
[1]: https://eel.is/c++draft/input.streams#istream.extractors-7
No, or at least not in general, because you can overload it for your custom types with whatever terrible implementation you like.
This is true in essentially the same way for C's FILE, which is the point I think Ted is insinuating.
That's because C is just a wrapper for machine code on a cheap PDP-11, and cheap PDP-11s didn't have enough RAM to do complexity.