Comment by thaumasiotes

4 years ago

I remember a proof in CLRS which first developed a function that was bounded above by 5 for all conceivable input ("a very quickly-growing function and its very slowly-growing inverse"), and then substituted the constant 4 or 5 into a complexity calculation in place of that function, giving a result which was "only" correct for all conceivable input.

The same approach applies to key length requirements for hash tables with arbitrarily large backing stores. They do not grow as slowly as the CLRS log* function, but they grow so slowly that there are easily identifiable sharp limits on how large they can be -- an easy example is that a hash table cannot use more memory than the hardware offers no matter how the software is written. A backing store with 1TB of addressable bytes cannot need the key to be more than 40 bits long.

On a different note, by "table size" in my earlier comment I meant to refer to the number of entries in the table, not the capacity of the backing store. It seems like you might be using the same word for a different concept?

4 comments

thaumasiotes

SilasX 4 years ago

>The same approach applies to key length requirements for hash tables with arbitrarily large backing stores. They do not grow as slowly as the CLRS log* function, but they grow so slowly that there are easily identifiable sharp limits on how large they can be -- an easy example is that a hash table cannot use more memory than the hardware offers no matter how the software is written. A backing store with 1TB of addressable bytes cannot need the key to be more than 40 bits long.

So? That's still putting a bound on table size, which makes it in-practice constant, but doesn't make the algorithm O(1), because you can never get such a result by bounding n, for the reasons the GGP gave -- that's cheating.

Your complexity bound has to be written on the assumption that n (number of elements to store in hashtable) increases without bound. Assuming you will never use more that Y bytes of data is not valid.

>On a different note, by "table size" in my earlier comment I meant to refer to the number of entries in the table, not the capacity of the backing store. It seems like you might be using the same word for a different concept?

No, I was using table size exactly as you, to mean the number of elements stored. Is there a reason my comments only made sense under a different definition? It not, be charitable. (And avoid using obscure terms.)

thaumasiotes 4 years ago
> No, I was using table size exactly as you, to mean the number of elements stored. Is there a reason my comments only made sense under a different definition? It not, be charitable. (And avoid using obscure terms.)
I interpreted your comment to refer to the size of the backing store, because that is fundamentally what a hash key needs to be able to address.
I didn't mean to say that, if you were using it that way, you were doing anything wrong, only that there appeared to be a mismatch.
- SilasX 4 years ago
  
  >I interpreted your comment to refer to the size of the backing store, because that is fundamentally what a hash key needs to be able to address.
  Under the assumption (upthread) of constant resizing as element are added, the distinction is irrelevant. The more elements you have in the table, the more elements you need to address, and the more possible outputs your hash function needs to have.
  And the needed size of the backing store scales with the number elements you want to store anyway.
  >I didn't mean to say that, if you were using it that way, you were doing anything wrong, only that there appeared to be a mismatch.
  Why bring up something like that if it doesn't translate into something relevant to the discussion e.g. to show my point to be in error?

ummonk 4 years ago

By that logic, any O(logN) factor is simply O(1). That O(NlogN) sort algorithm should just be considered O(N) for all practical purposes.