Comment by gnulinux
6 years ago
I disagree. It requires linear space in the sense that you need another data structure to deal with hash collisions. If you read my comment I was talking about exactly that. In normal applications of bloom filter it's built before a cache layer to ask if your data is in the cache. For this you do not need a list of objects since if bloom filter wrongly determined that object is in the set, it will be a cache miss and as long as a bloom filter operation is faster than a cache miss this is usually a good tradeoff; because normally when you're about to have a cache miss, you end up not going to cache because bloom filter says the object is not there. For this application, you do NOT need O(N) space inside the bloom filter layer, but obviously eventually someone will need to go retrieve this data.
> If you actually read any of the math
I mean I did and I was trying to describe that math above. I built bloom filters many times before. I'm curious if you misread my comment.
You said the following, so I assume you agree with it:
> It requires O(C) memory where C is the size of your largest hash function.
For any given expected false-positive rate P, C must be chosen to be a size of O(N), so a bloom filter with expected false-positive rate P will be of size O(N).