Comment by GuB-42
8 days ago
I came to realize that logs don't matter. O(log(n)) and O(1) are effectively the same thing.
The reason is that real computers have memory, accessing memory takes time, and bigger n needs more memory, simply to store the bits that represent the number. Already, we have a factor of O(log(n)) here.
But also each bit of memory takes physical space, we live in a 3D world, so, best case, on average, the distance to a memory cell is proportional to the cube root of the amount of memory cells we will have to access. And since the speed of light is finite, it is also a cube root in time.
So "constant time" is actually cube root on a real computer. And I am generous by saying "real". Real real computers have plenty of stuff going on inside of them, so in practice, complexity analysis at the O(log(n)) level is meaningless without considering the hardware details.
Going from log(n) to log2/3(n) is just an exercise in mathematics, it may lead to practical applications, but by itself, it doesn't mean much for your software.
>logs don't matter. O(log(n)) and O(1) are effectively the same thing.
ever heard of a hashtable? that's because O(c) is better than O(log(N)). if they were the same, you would only have heard of binary search.
Hash tables are indeed O(1) in theory while binary search is O(log(N)) in theory, no problem with that.
But in practice, accessing a physical memory cell is not O(1), it is O(cuberoot(N)).
For a binary search, you need O(log(N)) operation, but each operation (accessing a memory cell) is O(cuberoot(N')) where N' is the number of elements up to the level of the tree you are at, ending at N for the last iteration. So that's O(sum(0, log(N), cuberoot(N/exp(n))), which simplifies to O(cuberoot(N)), exactly the same!
In real practice, the choice of binary search vs hash table depends on considerations like cache behavior. Hybrid methods combining hash tables with hierarchical data structures are often used when N is large. But that's my point, you won't choose a hash table over a binary search because one is O(1) and the other is O(log(n)), you chose one over the other because of how it fits your hardware architecture.
Contrast with, say, O(n.log(n)) over O(n^2). When n is large, barring space-time tradeoffs, you should always pick the O(n.log(n)) algorithm, because the reduction in complexity will always beat any sane hardware considerations.
Please explain to me how you can hash n distinct strings into O(n) buckets in O(1) time. Please note that this process needs to work as n goes to infinity.
Hash tables are O(log n) structures when you don't hand-wave away the "compute a hash" part. The thing is, search trees are far worse than that in practice and you aren't hand-waving away the "compare two elements" part. That's where the real speed savings come from.
What I think you are saying is that computing the hash needs to process the entire string, and the length of that string roughly corresponds to log n, therefore it's O(log n). Not sure I am entirely convinced by that reasoning, but let's roll with it for now.
Because if you apply it to binary search, you need to compare the strings at every step, and by that logic, each of these operations is O(log n), which means your binary search is now O(log^2 n).
I guess the crux is that we are still comparing apples to oranges (or multiplication operations to comparison operations), and at the end what probably makes hashing faster is that we are not branching.
Still I don't think it makes sense to think of both hash tables and binary search as O(log n).
Most of this is very true, except for the one caveat I'll point out that a space complexity O(P(n)) for some function P implies at least a O(cubedroot(P(n))) time complexity, but many algorithms don't have high space complexity. If you have a constant space complexity this doesn't factor in to time complexity at all. Some examples would be exponentiation by squaring, miller-rabin primality testing, pollard-rho factorization, etc.
Of course if you include the log(n) bits required just to store n, then sure you can factor in the log of the cubed root of n in the time complexity, but that's just log(n) / 3, so the cubed root doesn't matter here either.
> Already, we have a factor of O(log(n)) here.
Doesn’t that mean that O(log(n)) is really O(log²(n))?
You have to define what n is.
It’s clear from the parent comment that the number of bits needed to represent the input is meant here.