Comment by fsckboy

5 days ago

>logs don't matter. O(log(n)) and O(1) are effectively the same thing.

ever heard of a hashtable? that's because O(c) is better than O(log(N)). if they were the same, you would only have heard of binary search.

4 comments

fsckboy

GuB-42 3 days ago

Hash tables are indeed O(1) in theory while binary search is O(log(N)) in theory, no problem with that.

But in practice, accessing a physical memory cell is not O(1), it is O(cuberoot(N)).

For a binary search, you need O(log(N)) operation, but each operation (accessing a memory cell) is O(cuberoot(N')) where N' is the number of elements up to the level of the tree you are at, ending at N for the last iteration. So that's O(sum(0, log(N), cuberoot(N/exp(n))), which simplifies to O(cuberoot(N)), exactly the same!

In real practice, the choice of binary search vs hash table depends on considerations like cache behavior. Hybrid methods combining hash tables with hierarchical data structures are often used when N is large. But that's my point, you won't choose a hash table over a binary search because one is O(1) and the other is O(log(n)), you chose one over the other because of how it fits your hardware architecture.

Contrast with, say, O(n.log(n)) over O(n^2). When n is large, barring space-time tradeoffs, you should always pick the O(n.log(n)) algorithm, because the reduction in complexity will always beat any sane hardware considerations.

chowells 5 days ago

Please explain to me how you can hash n distinct strings into O(n) buckets in O(1) time. Please note that this process needs to work as n goes to infinity.

Hash tables are O(log n) structures when you don't hand-wave away the "compute a hash" part. The thing is, search trees are far worse than that in practice and you aren't hand-waving away the "compare two elements" part. That's where the real speed savings come from.

hiimkeks 5 days ago

What I think you are saying is that computing the hash needs to process the entire string, and the length of that string roughly corresponds to log n, therefore it's O(log n). Not sure I am entirely convinced by that reasoning, but let's roll with it for now.
Because if you apply it to binary search, you need to compare the strings at every step, and by that logic, each of these operations is O(log n), which means your binary search is now O(log^2 n).
I guess the crux is that we are still comparing apples to oranges (or multiplication operations to comparison operations), and at the end what probably makes hashing faster is that we are not branching.
Still I don't think it makes sense to think of both hash tables and binary search as O(log n).