Comment by 0xcde4c3db
4 years ago
It's easy to forget that the original C standards were largely codifying existing practice during an era when using gets() [1] was existing practice. The world wasn't quite ready for Ada, I guess. Best-laid plans of mice and men etc. etc..
Also, keep an eye out for "amortized" complexity. This does have a legitimately rigorous definition, but for latency-bound paths it can practically amount to "O(whatever), except for the particular invocations that are far, far worse under unspecified conditions".
[1] https://www.man7.org/linux/man-pages/man3/gets.3.html#BUGS
It's also easy to forget that C was competing mainly with assembly, while C++ competed with managed languages. The early C programmer ethos, especially among library authors, was much more along the lines of "look at the generated object code if you want to know what it's doing" while modern practice leans more towards "read the documentation for complexity guarantees". I'm not saying that worse documentation leads to better programmers, but I'm not not saying that either. Practices change, standards change.
Good documentation and inspecting the compiled bytecode are both good ways of finding out about performance characteristics of certain features. The problem starts when people rely on assumptions ("sscanf should be fast because it's widely used") or performance folklore ("localizing every function you'll ever use makes your Lua code faster"), because those tend to either be completely wrong or lack very important context.
I live in js land, and the barrier between “folklore” and “documentation” is extremely thin. Especially since V8 may introduce changes at any time that affect performance characteristics of js.
I’d respond with “well if performance matters it shouldn’t be in js” except for all the shite being written in js these days, with js being the hammer that makes everything else look like a nail.
2 replies →
It makes me chuckle when hash maps are stated to be O(1) insertions. Which is true, in respect to the number of items in the map, assuming the map doesn't need resizing and there isn't a hash collision... but it's generally not true in respect to the key length. (I think most implementations are O(ln), where l is the length of the key and n is the number of inserted items, assuming the hash function is O(l) - the _amortised_ runtime would be O(l))
> assuming the map doesn't need resizing
This isn't a big difficulty; it's still amortized O(1).
> and there isn't a hash collision
This is a real difficulty, unless you allow map resizing. Luckily, we do.
> but it's generally not true in respect to the key length.
OK, but in most cases the key length is constant, making anything that depends on the key length O(1) by definition.
I wrote my own version of a part of a very popular Java scientific tool, and my version runs about 50 times faster. Their mistake? They had a hashCode() implementation on the objects they were using as keys for HashMaps that iterated through all of the voluminous content of that object. And there was no point - they could have used IdentityHashMaps instead with the same result. I pointed this out to them, and they still haven't fixed it.
I'm guessing GP means the complexity guarantee sidesteps the complexity of the hashing function. It probably doesn't matter all that much in typical case - I'm guessing 80-90% of hash map use is with very short strings.
11 replies →
If the key length is constant, the map as an upper limit on the number of possible elements, so all operations are constant time.
Key length must necessarily be O(log(N)) to be able to identify N different keys.
10 replies →
But, isn't the key length a constant and we are back to O(1)? Ok, in theory you could exhaust all possible keys of a certain length and proceed with longer keys. It would give us what? O(ln(n))?
His point is, if you use Moby Dick as the key, it's going to take longer to hash that than a three letter string. Hashing isn't O(1) if the key has variable size.
...I fully plan to use "O(whatever)". Not sure for what.
But, yes. (naive) Quicksort's amortized complexity being O(nlogn), but its O(n^2) on already sorted data, is all I ever needed to learn to take away that lesson. When sorting already sorted data is worse than sorting randomized data, it's a quick realization that "amortized cost" = "read the fine print".
Quicksort as O(n log n) is not amortized complexity, but average runtime for random data.
Interestingly, it is quite easy to double the speed of Quicksort, even at this late date, with just a couple-line change.
<http://cantrip.org/sortfast.html>
Anyway I thought it was interesting.
Something that is amortized complexity:
Most of the time, it's O(1). Sometimes it's O(n). If you double the size of the backing array when it runs out of space, it's amortized O(1).
1 reply →
Fair point; I'm confusing my terminology. Analogy and realization still holds.
6 replies →
I (sadly) have to resort to use "O(scary)" too often for my taste.
From: https://stackoverflow.com/a/185576/368409